, ,

Recent advances in the fields of machine learning, network analysis, and natural language processing have managed to spring forth large-scale tools and systems that are not only faster, but can also produce results that would be very hard for humans to derive. This has led researchers to apply methods based on the aforementioned innovations to gain more perspective in fields that are seemingly not closely related to computer science, such as music, the arts, and literature. The articles we will discuss in this post utilize techniques from the previously mentioned computer science fields to extract results about various properties of literary/theatrical works. While their process and end goals have differences, all three aim towards analyzing the characters, their relation and their effect on the respective works.

The first two articles both use a technique known as character network analysis, a technique that was first proposed in literary circles back in 2004, but wasn’t fully applied until being formally introduced around 2011. This method consists of building a graph, or “network”, which represents each character as a node and signify his/her relation with another character by having a connecting edge between the two nodes. After the network is constructed, various metrics are used to evaluate the nature of the network, such as density, centrality, coreness, and closeness.

The analysis conducted by Michael Falk of the University of Kent [1] is based on the novels of British writer Maria Edgeworth, whose works have been the subject of scrutiny in literary research circles. While most experts emphasize on the differences between her “British” and “Irish” novels, this publication uses character network analysis to also underline their similarities. The conducted analysis identified that all novels have one character node with a high degree of centrality, i.e. they have one central main character as their focus. Additionally, by running a community detection algorithm in the generated networks, it was concluded that all novels depict the world as a network of noble households. Conversely, the analysis also underlines the differences between the two types of novels; the “British” novels are represented by more dense networks, while the “Irish” novels have a higher degree of betweenness. This leads to the conclusion that the “British” novels depict smaller and more compact communities with higher character interaction, while the “Irish” novels are structured along the notion of a main character who connects the various minor characters.

Yannick Rochat of EPFL [2] also used the same methodology to analyze “Les Rougon-Macquart”, a collection of novels written by Émile Zola. While some of these novels depict one central main character, others divide their narrative among several characters. This diversity make this a good case study for character network analysis. Similarly to the previous study, a character network is constructed, metrics such as density and centrality are computed, and, additionally, a study of coreness is conducted, which measures how compact the main character group is. As a result of this analysis, the novels are classified on the properties of the main characters and can be further discriminated based on the character strength and the network sparsity.

The third article is a study of Noh farces (farces in the Japanese traditional performing arts, also known as Kyogen) conducted by Akihiro Kawase of the National Institute for Japanese Language and Linguistics [3]. Noh farces are structured in a succinct dialog format and depict numerous characters from different backgrounds and social strata. As such, they are important for humanities research, because they can offer an insight into the evolution of the Japanese language, as well as the very culture itself.
The process followed in this analysis is a little different: instead of building a network based on character interactions, character speech is scrutinized and keywords
associated with each social status are used to build co-occurrence networks. Metrics used in word analysis are then extracted (term frequency and inverse document frequency among others) to evaluate each network. Additionally, the same density and centrality analysis that was used in the previous articles is utilized here. This analysis concludes that character roles can be determined by their speech, a result which is undoubtedly impressive and can be important in related future research.

The three studies discussed in this post successfully derive concrete results by analyzing literary works from three different countries, while following similar methodologies. This implies a universality in literary analysis that should be leveraged in humanities research. Granted, the last article conducted its analysis by taking a seemingly different approach, but in the end the same notions of network analysis were used to derive the final results. It all points to the importance of character network analysis, a technique which has proven to be effective and is sure to be a major asset in humanities research in years to come.


[1] Michael Gregory Falk, University of Kent. Modelling Genre Using Character Networks: The National Tales and Domestic Novels of Maria Edgeworth

[2] Yannick Rochat (yannick.rochat@epfl.ch) (2015). Character network analysis of Émile Zola’s Les Rougon-Macquart

[3] Akihiro Kawase, National Institute for Japanese Language and Linguistics. The Characteristics of Personae Observed Within Toraakirabon Kyogen Scripts: Extracting Conceptual Words Using Quantitative Analysis