, , , , , , ,

From the past, content analysis have caused historians much problems due to the amount of archives researchers have to read and examine. Working progress is slow and even duplicated work is sometimes unavoidable. The study of content analysis began from 1940s with word frequency counts to nowadays using modern computer to automate the process of complex algorithms to analyse semantics properties of the source information.

The first paper Concepts Through Time: Tracing Concepts In Dutch Newspapers Discourse (1890-1990) using Word Embeddings presents a new technique of historical searching. Traditional searches require researcher to define concept topics (pre-defined words or automatically generated by an algorithm). However the semantics of concepts do change with respect to time, and this makes the historians harder to recalibrate these static concepts and therefore it’s harder to trace the concept over a relatively long period of time. The new technique CTT(Concepts Through Time) introduced in the paper have eliminated the static dependencies. It is based on word embeddings and uses a large set of data to train it’s neural networks to monitor the change of semantically related words over a period of time. It allows more sensitive semantic changes and the paper also demonstrated CTT tracing concept over the 500,000 newspaper issues (1890 -1990) in the Dutch National Library.

The second paper Quantifying Ambiguity by Gamifying the Writing Process: A Case Study on William Black’s “The Sick Rose” presents the use and impact of ambiguity analysis through a case study on poetries. The proposed algorithmic tool takes text as input and produce an ambiguity number which presents the ambiguity magnitude of the text. It also allows texts to be objectively compared, individual phrases/words to be quantified. As a result of the study, the author realised that dictionaries used by poet and reader contains ambiguous definitions. For example a definition contains a word, but the definition of the corresponding words also contain the original searched word. The author is also inspired to build a poem dictionary with minimal ambiguous which  use text as well as multimedia data to identifies concepts and link associated entities.

‘“Everything on Paper Will Be Used Against Me”: Quantifying Kissinger’. The project uses content analysis techniques to interpret the  Digital National Security Archive’s Kissinger Collection and try to understand the internal contradictions on Henry Kissinger , who is associated with many historical American affairs during the cold war. The data collection contains over 18000 memcons and telephone transcripts. Word frequencies and mutual information score  were analysed to study the social relationships of Kissinger and it’s impacts to his decisions and views of historical events. Several visible graphs are also produced based on the collocation of key words and geographic locations to give more insights of Kissinger.

In comparison, The last paper focuses on conceptual analysis of the archives whereas the first two paper make use of relational analysis techniques. Conceptual analysis solves research question by quantifying sampled words or patterns in a given sample source. Relational analysis looks at the semantics and the relationships between concepts and entities.

The study of content analysis is the key to understand the intention and motivations of individuals or groups, it can also be used to reason and describe events from the past, identify current trends or even predict future trends.


[1] Concepts Through Time: Tracing Concepts In Dutch Newspapers Discourse (1890-1990) Using Word Embeddings. Melvin Wevers , University of Utrecht, Tom Kenter, University of Amsterdam, Pim Huijnen, University of Utrecht

[2] Quantifying Ambiguity by Gamifying the Writing Process: A Case Study on William Blake’s “The Sick Rose”. Dana Milstein, Yale University, Euan Cochrane Yale University.

[3] ‘”Everything on Paper Will Be Used Against Me”: Quantifying Kissinger’. Micki Kaufman, CUNY Graduate Center, United States of America