Page 64 - AC/E Digital Culture Annual Report
P. 64

64One of the few articles to have shed some light on the matter is entitled “Unde ned by Data:A Survey of Big Data De nitions”. The authors (Ward and Barker, 2013) collate the various de nitions of “Big Data” provided by major technology companies like Oracle, Intel and Microsoft and a few previous reports. In general, the de nitions combine two important ideas: storage of a large volume of data (some authors speak of 500 Terabytes per week); and analysing this data quantitatively and visually to  nd patterns, establish laws and predict conduct.The classic de nition of “Big Data” is a formula that is easy to understand and memorise – the three Vs: Volume (Terabytes, Petabytes, Exabytes), Velocity (data that is constantly generated) and Variety (texts, images, sounds) (Ward and Barker, 2013). Some reports have subsequently added a fourth V, which stands for the term Veracity. Though this volume-based de nition of Big Data only makes sense if we consider blogs, the social media and sensors to be main sources of data.In contrast, the classic object of study of the humanities is usually texts and analogue images which have fortunately been digitised and published in computer-legible format. In other words, if we take the three Vs as a basis, we have to admit that we cannot speak of Big Data in the strict sense in the humanities. For one thing, the classic works of Spanish Golden Age poetry  t into a 4GB pen drive; for another, archives and libraries do not constantly produce new data and at a high speed on our poets, writers or artists (or rather, this data is not accessible to research- ers). As for variety, we are dealing with image  les in TIFF, JPEG or another similar format, and semi-structured text in XML format or, without markup, in TXT format.Before the advent of Google Books in 2004, digital humanists worked to digitise corpuses of texts and images in the form of digital editions, libraries and  les. The European Association for Digital Humanities (EADH) provides a goodexample of the type of projects carried out. Since 2015, the association has devoted a space on its website to documenting and promoting access to European Digital Humanities projects conducted in the past  ve years. The initiative is participatory in nature because any researcher (whether or not they belong to the association) can  ll in the form available on the website and submit a description of their project providing details of the name of the project, a descriptive summary, collaborating institutions or the team in charge, among other  elds. So far, at the time of writing this article, the association has re- ceived 175 submissions. If the titles and summa- ries are analysed with Voyant, a tool for counting the most frequently used words, it is easy to see that the projects abound in words related to the subject of this article, such as “data”, “informa- tion” and “database”, and others that denote the scale or size of the project, including “archive”, “collection”, “platform” and “library”.Words used most frequently to describe Digital Humanities projects in Europe CC-BYThe current state of the Digital Humanities in Europe can be gauged by three aspects: projects, tools and research groups. Prominent among the projects for making digital texts available online are Oxford Text Archive, Deutsche TextArchive, Eighteenth-Century Poetry Archive, and DigiLibt. Tools for textual analysis include Alcide, CATMA and Stylo R. Infrastructure and research groups such as CLARIN, CLiGS and Electronic Text Reuse Acquisition Project are also important. These initiatives use algorithms to attribute authorship of texts (Burrows, 2002), discover latent themes underlying a large group of texts (Blei, 2012), or detect cases of intertextualityin several authors’ literary output (Ganascia, Glaudes and Del Lungo, 2015). Su ce it to sayBIG DATA IN THE DIGITAL HUMANITIES · ANTONIO ROJASSmart culture. Analysis of digital trends


































































































   62   63   64   65   66