, , , , ,

One of the main trends in digital humanities is to digitize historical components such as letter exchanges, medieval books or manuscripts. Western Europe history is full of archives that contain very interesting information on socio-political contexts. Unfortunately, these registers have been hidden from masses throughout history but this problem has recently been challenged by the rapid development of digitizing technologies and the internet. Historical archives usually constitute a huge amount of data that cannot be treated by individual researchers. Therefore, communities of people can interact to enrich metadata (via ideas or contributions as an example), a phenomenon often referred to as crowdsourcing.

Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers [1].

This study will review three projects that varied the amount of external input and propose a general pathway to digitize historical data.

Different challenges, different solutions

To better understand how metadata can be generated from manuscripts and how people can interact to enrich the metadata, a few examples are taken from the DH2013 conference. They will provide a better insight on how to effectively digitize manuscripts according to the type of the manuscript.

For the Medici Archive Project [2], translation of more than 4 million letters exchanged in Europe from and to the Medici family was achieved by a small interacting community of scholars presenting high levels of expertise in paleography and historical training. Indeed, different languages (Italian, German, Dutch, Latin etc.) were used on a period from 1537 to 1743, therefore requiring different levels of expertise to translate these letters in English. In this regard, the model developed during this project was based on a community-sourcing model where academically-acclaimed scholars could interact and show data on a forum. A small-scale test showed the ability of such a small community to enrich a digitized historical database.

The CFRP project’s main objective was to use the comédie-française’s repertoire to dg1analyze socio-political trends during the 1680 to 1793 time span in France [3]. Indeed, the authors claimed that analysis of the theatre’s income during history could establish trends in the French culture as well as the political situation at a given period. Indeed, analysis of the metadata proved that people were less prone to go to the theatre after a king’s death. Other studies with varying parameters shows an interesting promise for further scholar analysis, where new studies could be achieved via the data interpretation of this study. We can thus say that the data, which is simple to digitize since composed of numbers only has no external crowd input to build the database but still proposes powerful tools that can be used later on for crowdsourcing.

Thus, rather than crowdsourcing, MAP’ s approach is
one of community-sourcing, creating a hierarchy of levels of contributors

Finally, a comparative Kalendar was yet another example of digitization. The dg2challenge for this research was to study different versions of a similar manuscript entitled Book of Hours [4]. The main goal was to establish differences in the manuscript based on the spatio-temporal context of the writing using a distributed environment. In this way, different repositories and tools could interact to propose a comparative Kalendar, where user-generated data (transcription, commentary notes) can be added and shared between users, leading to a “dynamically growing resource”. This platform is thus a typical example of large-scale crowdsourcing.

Crowdsourcing, small-community sourcing or none of them?

We have seen in this comparative study that historical data can be digitized but that the means to pursue the study as well as the output can be varied. The metadata generated from the manuscripts can be obtained by crowdsourcing, small community-sourcing or none of beforehand mentioned terms. For translation-related tasks, small-community sourcing is preferred, this allows a certain control on the data being produced as well as increased exchange between participants who would be more familiar to each other. When digitized data has to be analyzed and interpreted for comparative purposes, the amount of external influx can be drastically varied. Indeed, relating data needs to go along with interpretation of the data, this can be either achieved by modern technologies or by a large interacting community. If the data is simple to digitize and easily comparable (number of tickets per day), crowdsourcing is not necessarily a good option. Instead, tools can be implemented to compare the data whereon another platform or interactive studies can be built. When the data is hard to compare via standard algorithms (images, local dialects etc.), crowdsourcing is preferred. One can annotate or add metadata to the original comparative database, ideas will then stack and create a dynamical platform. This ends up with a large amount of data being generated which comes at a cost of reliability of the information being produced.

To conclude, we propose a comparative study of digitization cases from historical archives. Different solutions exist depending on the type of data being produced. Generally speaking, crowdsourcing is a powerful tool in digital humanities and has been proven to be suited for metadata enrichment from historical archives.


[1] http://www.merriam-webster.com/dictionary/crowdsourcing

[2]  A Comparative Kalendar: Building a Research Tool for Medieval Books of Hours from Distributed Resources. Albritton, Benjamin; Sanderson, Robert; Ginther , James; Bradshaw , Shannon; Foys, Martin. http://dh2013.unl.edu/abstracts/ab-422.html

[3] Opening Aladdin’ s cave or Pandora’ s box? The challenges of crowdsourcing the Medici Archives. Allori, Lorenzo; Kaborycha, Lisa. http://dh2013.unl.edu/abstracts/ab-312.html

[4] Visualizing Centuries: Data Visualization and the Comédie-Française Registers Project. Lipshin, Jason; Fendt, Kurt; Ravel, Jeffrey; Zhang, Jia. http://dh2013.unl.edu/abstracts/ab-458.html