Text mining is a technique to derive high quality information (which can often be “hidden” quite well) from all kind of text sources. High quality means “giving new insight” or “being relevant” to a certain research question [1]. Consequently it is an important technique in digital humanities for analyzing sources, e.g. newspaper archives, historical texts, transcriptions of books, etc. Together with other methods it provides a rich framework to analyze, reinterprete or connect different pieces of information automatically. As an example, we will look at three cases where text mining is mentioned in papers that were discussed at the digital humanities conference Lausanne 2014 [5]. Each of them tackles the topic in a different way.

In their paper “Constructing Scientific Archives that Support Humanistic Research” [2] Christopher Prom and his Co-Authors analyses three questions concerning the preservation of the scientific process:

  1. Which information is preserved and made accessible?
  2. What evidence does it provide?
  3. How can it be used?

They go on to describe the roles of different fields (Academic Archives, Digital Preservation, Digital Curation, Anthropology, Digita Humanities) in answering these questions. Text mining is mentioned as a method of Digital Humanities to provide an algorithmic supplement to the pursuit of past, present, and future research questions. In this perspective (text mining only as a small puzzle-piece in a greater problem) it becomes clear that the true strength of text mining lies in its connection with other technologies, e.g. network analysis, information visualization or machine learning in general.

This intuition is also confirmed in Uta Hinrichs reports on a two year project, called “Trading Consequences”. This project has the goal to analyze archives for information on trading of commodities in the british empire during 19th century [3]. Again, text mining is used in connection with other techniques: The project really focuses not only on the mining, but also on the visualization. Its result is a system containing a database, but also several processing stages and a web interface.
They go on to describe their text mining tools (there seems to be a great diversity, one has the impression), which perform preprocessing, entity recognition and grounding. This means: finding names, locations, commodities and dates mentioned in the text automatically and connecting it with existing knowledge databases. Finally they identify relations between the entities found, to finally pin down the relevance of commodities in space and time.
Further, they write: “The strength of information visualisation is to make abstract concepts and relations within data visible and explorable”. This explorability manifests itself in the product of the Uta Hinrichs project: Interactive maps and timelines highlighting events and trends in range of document data (see picture below). A really smooth implementation of visualizing mined data.


Interactive maps and timelines to “browse” document data

Transatlantis is a project described by Pieter Toine in his paper “Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories” [4] which uses text mining to find out about so called reference cultures. Reference cultures are an abstract concept, meaning a collection of models or scientific trends that are agreed on over trans-national discourses over longer periods of time. The historical dynamics of reference cultures have never been systematically analyzed so far. However, they provide an interesting topic to be tackled by text mining, with help of which the public discourse that lead to the developement of these collective frames of reference can be charted.
More specifically, the paper concentrates on the role of the United States of America as a reference culture in the public discourse of the Netherlands. To achieve their goal, the project used text mining tools which have direct access to a large database of historical texts and can produce different output like wordclouds or timelines to visualize topics (see picture below).
They show that text mining can open new perspectives in historical research because it:

  • enables new perspective on macro history; and
  • can be complemented with numerical data sets provided by other researchers, for example on economic and social trends.
DH2014_TextMining Example

Example output of the Transatlantis project

We conclude, that text mining can be a very helpful tool for Digital Humanities, especially if used in connection with other data crunching methods, or as part of a bigger project. We’ll see what combinations of techniques creative minds will come up with in the future…


  1. “Text mining”, in Wikipedia, retrieved October 21, 2014, from http://en.wikipedia.org/wiki/Text_mining
  2. Prom, Christopher. Constructing Scientific Archives that Support Humanistic Research, in Digital Humanities Lausanne ’14 Conference Archive, retrieved October 21, 2014, from http://dharchive.org/paper/DH2014/Paper-101.xml
  3. Hinrichs, Uta. Trading Consequences: A Case Study of Combining Text Mining & Visualisation to Facilitate Document Exploration, in Digital Humanities Lausanne ’14 Conference Archive, retrieved October 21, 2014, from http://dharchive.org/paper/DH2014/Paper-373.xml
  4. Pieters, Toine. Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories, in Digital Humanities Lausanne ’14 Conference Archive, retrieved October 21, 2014, from http://dharchive.org/paper/DH2014/Paper-757.xml
  5. Digital Humanities Lausanne 2014, http://dh2014.org/