4 Data Science for Librarians Also, another way to view data is that the digital version “encodes” for practical purposes while analog tries to be “just like” what is being observed (Breuer, 2005). Nondigital data is at a greater risk of loss. Whereas digital data can be encrypted and stored in the cloud, nondigital data is usually stored in some type of physical media, irrespective of ­whether that is the magnetic tape of your VCR cassette, surface grooves on vinyl rec­ords, or any other nondigital media. A ­great way to characterize nondigital data is that this type of data exists without being mea­sured. So, for nondigital data to be easily converted into digital form, it has to be both captured and rendered with the help of specific technologies. Digital data therefore typically uses a ­simple and con­ ve­nient binary system in order to build data sets that represent video, images, or audio input. All types of everyday technology work mostly with digital instead of ana- log technology. For example, smartphones receive and transmit calls by effi- ciently converting the sounds of an individual’s voice into numbers and then transmitting ­ these numbers from one place to the other in the form of radio waves. When used in this specific way, digital technology has several advan- tages. For one ­ thing, it is easier and more con­ve­nient to store data in digital form ­ because it usually takes up less “room.” One ­simple and common example of the main difference between non- digital and digital data is moving ­ water. The nondigital data is the real ­ water surface in motion that ­human senses ­will perceive largely as the subtle changes to the color as well as the physical motions, texture, and sometimes even smell of the ­ water. One the other hand, a digital format ­ will convert color properties or the physical movement, or both, into separate data sets that simulate all ­these properties on a hardware interface or that would be stored for impor­tant research purposes. What Is Big Data? Big Data can be defined as a large volume of semi­-­structured, struc- tured, and unstructured data that can be mined for information as well as being used in vari­ous machine learning proj­ects and other advanced and sophisticated analytics applications. NIST defines Big Data as “the data of which volume, acquisition speed, or data repre­sen­ta­tion limits the capacity of using traditional relational methods to conduct effective analy­sis or the data which may be effectively pro­cessed with impor­tant horizontal zoom technologies.” Big Data is often associated with the 3Vs, 4Vs, and 5Vs. Doug Laney attempted to define Big Data in 2001 by stating that Big Data represents the 3Vs: volume, velocity, and variety. The 3Vs formed the basic and original char- acteristics of Big Data. Volume referred to the sizable amount of data, velocity was the frequency of data being created in terms of time, and variety referred to the dif­fer­ent types of available data. The 3Vs are excellently summarized in the definition of Big Data given by AnnaLee Saxenian (2014), Dean of Univer- sity of California Berkeley School of Information, in which she states, “Big Data is data that ­can’t be pro­cessed using standard databases ­because it is too big, too fast-­moving, or too complex for traditional data pro­cessing tools.” The definition of big data continues to evolve, and this is evident by the number of Vs that continue to be added to big data characteristics. ­After the 3Vs, a fourth V was added, veracity. Veracity refers to data accuracy. The 4Vs of Big Data are often the most cited and used in research however, some ­ people claim that the number of Vs could go higher than 10. But before
Previous Page Next Page