8 Data Science for Librarians have now made it hard to aggregate, analyze, and visualize information efficiently. This is why it is extremely impor­tant to effectively ­counter data deluge, which can be done by retrieving the exact data amount needed. That not just saves money and resources but also offers required and valued insights. The biggest challenge is usually posed by the huge volume of unstruc- tured data, such as video, text, images, as well as audio, that does not fit in the rows and columns of a single relational database. Obtained usually in vari­ous forms such as MS Office documents, e-­mails, instant messages, or social media posts, this data is not just tricky to analyze but can cause stor- age prob­lems too. ­ There are many options to store Big Data, from flash storage and hybrid clouds to cold storage archiving and intelligent software-­designed storage (I-­SDS). Although storage itself is quite cheap ­these days, the other costs associated with the energy consumption and maintenance of large data cen- ters can be astronomical. Also, security is another key concern when it comes to Big Data—­whether it is stored on local infrastructure or the cloud. Inasmuch as data is gathered from vari­ous sources and distributed computing is common in data analytics, it is not surprising that several ave­nues are open for security or data breaches. Also, ­ because we are talking about Big Data, any breach in data security is likely to result in the compromise of a huge amount of crucial information. Open Data Open data can be defined as data that can be freely used, distributed, reused as well as redistributed by anyone. Open data should be available in bulk (which means the dataset files can be easily downloaded) also, it should be freely available to every­one or at least not for more than a realistic repro- duction cost. The information has to be digital, available preferably by downloading through the Net, and easily and quickly pro­cessed by any computer. Other­ wise, most users cannot fully leverage the power of data—­that is, combine it to create new and improved insights. Also, ­ there are two key dimensions of data openness: 1. Legally Open: The data should be legally open, meaning it has to be kept ­either in the public domain or placed ­under more liberal terms of use that have minimal restrictions. 2. Technically Open: The data must also be technically open. This means it should be published in an electronic format that is non- proprietary and machine readable, so that every­one can easily access and use the data with common and ­ free software tools. In addition, data must be available to the public and accessible on pub- lic servers, without firewall or password restrictions. Open data can also include non-­textual material like genomes, maps, connectomes, mathemati- cal and scientific formulae, chemical compounds, medical data, biodiversity, and bioscience. It is worth pointing out that prob­lems tend to arise when ­ these are commercially valuable or can be aggregated into valuable works. In some cases, access to and reuse of the data are monitored and controlled by organ­izations, both private and public. Currently more governments are
Previous Page Next Page