This is an important part of what digital humanities is about: not only using modern computer technology to store large amounts of data but also harnessing the power of computers to extract information from this data. Although, at first glance, the literary and computer science fields seem to have very little in common, the tools developed within the second one can be used to perform quantitative analysis in the first one. Not only do computers help us perform tasks more efficiently than any human, but the quantitative results they give us are different from what could be obtained with a qualitative approach. The computational approach opens up new perspectives and has already proven useful in various sub-fields of digital humanities.
I will try to convince you based on three abstracts from the 2014 digital humanities conference (you can find the links to these abstracts down below).
The first article, somewhat shockingly, is about computational poetics. It might seem strange to mix computational methods and poetry, but this was made possible by massive digitization of nineteenth-century texts. The author’s goal is to extend to the field of poetics a computational method used to recognize linguistic patterns from particular authors. In her paper she focuses on the historical practice of enjambment (a key feature in poetry). She introduces three quantitative measures which should contribute to better understand enjambment in poetic forms, themes and genres or in a particular poets’ style.
The second article is about computational stylistics, and more specifically about analyzing perspectives in novels. The authors picked a popular Japanese novelist who explicitly switches perspectives in his novels. Then they tried to apply a machine learning method called random forests which had already been used with good results for authorship attribution in the field of stylistics. Indeed, one of their goals was very similar since they wanted to test whether the algorithm could attribute correctly each section to the corresponding character’s perspective.
The last article discusses about computational stylistics in a wider sense and how we can expand the field and validate the quantitative results that are obtained through computational methods. Once again it is mentioned that computational text analysis methods have already been developed for authorship attribution. The validation of such methods can be done using real data where the origin of the text is known, then adapting the method so that it also works with other texts. With this idea in mind you can try to apply computational methods to other classification problems and validate those methods, as long as you have known data to test them. The article gives some examples of such studies that treat questions other than authorship attribution. For example, one of them analyzes genre, date and form in addition to authorship (focusing on French enlightenment plays).
As seen through these three articles, computational analysis of texts is not completely new, and has already been used successfully in authorship issues. This is a common theme between the three articles: expanding a successful idea to a new but similar problem. This is made easy by the generality of the computational methods used.