, , ,

Visualization of data has long been used to help scientists better comprehend and hence make use of their findings. While most visualization techniques have emerged and are used in fields of science, they can now be applied to humanities’ fields like English! What exactly we can visualize about English and why it is important is the purpose of this post.

Following are three different articles from the Digital Humanities 2012 conference that discuss the use of visualization techniques for exploration. One explores change and evolution of the English vocabulary. Another uses the English vocabulary to explore cultural change. And the last one explores the creative influence of literary works on their successors.

The first article [1] explores the evolution of the English lexicon. It employed the Historical Thesaurus of English which is the largest thesaurus that exists and which includes a hierarchy of semantic categories since the Anglo-Saxon time. Treemaps, shown in Figure 1, were used to view the change in history, culture, as well as the experiences of English-speaking people by way of language.

Figure 1: Treemap of the English vocabulary from the Historical Thesaurus of English.

The treemap, where each entry is a rectangle, allows one to see the semantic structure of the English Language. It also allows one to see where new vocabulary was introduced. For instance, it can be shown that areas like Computing (lightly shaded on the map) are relatively new compared to Arithmetic (darkly shaded).

Next, the second article [2] actually presents a visualization tool called DiaView. While the tool may be applied to different areas, the article presented its application to visualize cultural change through word usage. This was done by using the Google Books Corpus containing 1 million English books and finding prominent words in each decade or year. It is proposed that such a method provides an insight into the cultural situation at a given time. Moreover, the article emphasized that current visualization tools (eg. Google Books n-gram Viewer) require the user to search for a specific word and then compare its frequencies over time or between two points in time to quantify the change; however, as noted by the authors, this assumes that the user knows what they are looking for and generally limits their view. This is where DiaView comes in. It automatically extracts the important lexical items for each year where important  does not necessarily mean the most frequently used; statistical measures like mutual information or log-likelihood for example can be employed to assess a word’s association with a specific year. The visualization was then simply a listing of the top words per year with hyperlinks to other useful resources. All this provides the user with the capability to simply explore.

Finally, the third article [3] explores another aspect of English, namely literary creativity. It is hypothesized that literary works influence subsequent works intentionally or unintentionally. This influence also includes authors trying on purpose to be different from their predecessors! In order to study this influence, a corpus of books written between 1780 and 1900 from Britain, Ireland, and America was used. For each book, topics (themes) and stylistic information such as relative word frequency and mark of punctuation were extracted. The influence was then proposed to be the similarity between books. Therefore, the similarity between each book and every other book was measured; but similarities between a book and those published in the same year or before were of course eliminated. For visualization, a tool called Gephi was utilized. The tool generated a network where books were the nodes and similarities were the edges, where the longer the edge, the less similar the books. Figure 2 shows the network with the nodes colored according to their publication date. As can be seen, there is a gradual shading from right to left indicating how styles and themes develop over time.

Figure 2: Network diagram of the books. Similar books are close to each other.

An interesting fact mentioned by the author is that some of the great works are actually outliers to the main clusters found. Furthermore, it is found that the most original influential authors are Jane Austen and Walter Scott who form, as the author puts it, the Adam and Eve of the stylistic-thematic genealogy.

In conclusion, visualization techniques are quite helpful to reveal hidden patterns in data which not only improves understanding but enables exploration as well. Although some are simple as a treemap, a list, or a network, their application to the humanities provide powerful insights into possible areas of importance or interest and are thus indispensable to analysts.


  1. Patchworks and Field-Boundaries: Visualizing the history of English. Alexander, Marc, University of Glasgow, UK, marc.alexander@glasgow.ac.uk
  2. DiaView: Visualise Cultural Change in Diachronic Corpora. Beavan, David, University College London, UK, d.beavan@ucl.ac.uk
  3. Computing and Visualizing 19th century literary genome. Jockers, Matthew, Stanford University, USA, mjockers@stanford.edu