4 Data Science for Librarians Also, another way to view data is that the digital version “encodes” for practical purposes while analog tries to be “just like” what is being observed (Breuer, 2005). Nondigital data is at a greater risk of loss. Whereas digital data can be encrypted and stored in the cloud, nondigital data is usually stored in some type of physical media, irrespective of whether that is the magnetic tape of your VCR cassette, surface grooves on vinyl records, or any other nondigital media. A great way to characterize nondigital data is that this type of data exists without being measured. So, for nondigital data to be easily converted into digital form, it has to be both captured and rendered with the help of specific technologies. Digital data therefore typically uses a simple and con venient binary system in order to build data sets that represent video, images, or audio input. All types of everyday technology work mostly with digital instead of ana- log technology. For example, smartphones receive and transmit calls by effi- ciently converting the sounds of an individual’s voice into numbers and then transmitting these numbers from one place to the other in the form of radio waves. When used in this specific way, digital technology has several advan- tages. For one thing, it is easier and more convenient to store data in digital form because it usually takes up less “room.” One simple and common example of the main difference between non- digital and digital data is moving water. The nondigital data is the real water surface in motion that human senses will perceive largely as the subtle changes to the color as well as the physical motions, texture, and sometimes even smell of the water. One the other hand, a digital format will convert color properties or the physical movement, or both, into separate data sets that simulate all these properties on a hardware interface or that would be stored for important research purposes. What Is Big Data? Big Data can be defined as a large volume of semi-structured, struc- tured, and unstructured data that can be mined for information as well as being used in various machine learning projects and other advanced and sophisticated analytics applications. NIST defines Big Data as “the data of which volume, acquisition speed, or data representation limits the capacity of using traditional relational methods to conduct effective analysis or the data which may be effectively processed with important horizontal zoom technologies.” Big Data is often associated with the 3Vs, 4Vs, and 5Vs. Doug Laney attempted to define Big Data in 2001 by stating that Big Data represents the 3Vs: volume, velocity, and variety. The 3Vs formed the basic and original char- acteristics of Big Data. Volume referred to the sizable amount of data, velocity was the frequency of data being created in terms of time, and variety referred to the different types of available data. The 3Vs are excellently summarized in the definition of Big Data given by AnnaLee Saxenian (2014), Dean of Univer- sity of California Berkeley School of Information, in which she states, “Big Data is data that can’t be processed using standard databases because it is too big, too fast-moving, or too complex for traditional data processing tools.” The definition of big data continues to evolve, and this is evident by the number of Vs that continue to be added to big data characteristics. After the 3Vs, a fourth V was added, veracity. Veracity refers to data accuracy. The 4Vs of Big Data are often the most cited and used in research however, some people claim that the number of Vs could go higher than 10. But before