, , , , ,

“Study the past if you would define the future” said Confucius. This may be one of the reasons why people are so much interested in learning about their ancestors. Knowledge of past events or information about former lifestyles could pretty much help us deal with current issues.

Thanks to high-performance computing tools, historical data is now not only reserved to a restricted committee of archivists but is available to a much greater audience. Everybody can have access to any part of the historical timeline, discover any country in the past or even learn about their own history: many people are now trying to achieve their own genealogical tree by gathering online information. But what is the relevance of such a tree if the places mentioned are not correct, if the names of the people are misspelled or if the dates are incomplete?

Because this new type of data is a common construction involving many different groups of people, it is often difficult to assess its veracity. Recording of data in the past was also not as systematic and homogeneous as it is today, which can lead to some confusions. As the Digital Humanities movement is gaining momentum, several research groups are now trying to develop tools to encode historical data in a more accurate way.

charter1For instance, a group consisting of several researchers from Europe, Canada and the USA, has recently launched a project called Chart-Ex (Charter-Excavator), aimed at extracting the relevant information from medieval charters [1]. Indeed, charters and deeds were at the time of the Middle Ages the only legal documents and contain information as various as topography, economy or social relationships. By combining the techniques of natural language processing and data mining, and using a set of training charters, the researchers were able to teach their system to recognize some features such as names, dates or locations and, what is remarkable, in several different languages. They then set links between the different pieces of data they collected and gathered them through a so-called ‘virtual workbench’, which is to be used and completed by other historians. This new tool is therefore entitled to become a very precise and detailed database for recording medieval occurrences.

Another group, based at the University of Victoria, worked on a way to encode dates with more accuracy [2]. The main reason for that is that the recording of dates in the past was often done haphazard. Whereas the Gregorian calendar is now worldwide used, several dating methods were still in use across Europe at the time of the Middle Ages and the conversion between the different systems is not always easy. This can often lead to misunderstandings or even mistakes, which are not acceptable if one wants to define a precise timeline of a certain period. To tackle this problem, the researchers have developed a tool that converts all dates into their exact correspondence in the Gregorian system when possible, or at least provides the range of uncertainty of the conversion. This was at first designed for a project aimed at properly encoding dates for British events but could easily be extended to other regions in the world, according to the researchers.


A third group from the Martin-Luther-Universität of Halle-Wittenberg in Germany has developed an innovative platform aimed at facilitating the capture, annotation and indexing of data [3]. The purpose is to make the process of entering metadata both simple and accurate. To meet these goals, the platform includes several features aimed at normalizing the data. There is for instance a tool that can convert any date to an unambiguous format and check if this date is relevant according to other information. The particular structure of the platform also makes the queries very simple for visitors: the system looks for correspondences in the database and provides additional information about the request. This can be very helpful to distinguish between duplicates, which happens quite often when searching data over several centuries.

When comparing these three examples, it seems that some common trends can be identified. First of all, accuracy does not necessarily means correctness but means being aware of the uncertainty when there is some. All the systems developed in the examples try to take this uncertainty into account. It also appears that accuracy can only be reached if everyone is taking part. And this is maybe the great weakness of Digital Humanities: as it the result of several isolated groups working separately, it is very difficult to set standards and verify the veracity of all the data that is computed. This is why the researchers in the three examples try to extend their systems to new regions or new types of data, hoping to get more transparency.

Yet, this weakness can also become strength if we consider that people from different backgrounds can share their ideas to improve the existing encoding systems. Therefore, the ‘revolution’ in Digital Humanities that was enabled by the development of new computing tools, could well affect other fields like tourism or environment conservation.


[1] H. Petrie et al (2013). ChartEx: a project to extract information from the content of medieval charters and create a virtual workbench for historians to work with this information, Digital Humanities 2013, University of York. http://dh2013.unl.edu/abstracts/ab-431.html

[2] M. Holmes et al (2013). Encoding historical dates correctly : is it practical, and is it worth it ? Digital humanities 2013, University of Victoria. http://dh2013.unl.edu/abstracts/ab-179.html

[3] M. Andert et al (2013). Optimized platform for capturing metadata of historical correspondences, Digital humanities 2013, Martin-Luther-Universität Halle-Wittenberg. http://dh2013.unl.edu/abstracts/ab-246.html