Visualization of data seems to be the new prerequisite of profound understanding in every field of study. This statement is corroborated by the fact that the human visual organ is the main entry point of information. Visualization of data is great for stimulating the human’s ability to recognize patterns, for a better understanding of the underlying causality relations within a data set.
This article will focus upon tools designed for a particular area of study, related to literature. The computational tools discussed here were presented in three abstracts from the Digital Humanities 2013 Conference and support the idea of using visualizations for thoroughly understanding texts.
PoemViewer is a tool for visualizing poetry, developed by a group of researchers form University of Oxford and the University of Utah. This web application is built upon preexisting linguistic mechanisms and models, such as the International Phonetic Alphabet for classifying sounds and the CLAWS part-of-speech tagger for English text .
The main purpose of this tool is to allow a fuller understanding of how poems are built, satisfying both the qualitative and the quantitative sides of literature research. Generating visualizations, in this case, does not require overly simplifying the poem for the sake of quantitative analysis, but instead, preserves and evaluates the complexity of the artworks.
The tool treats poems as ever-changing systems, acknowledging the high influence of components such as rhyme, meter, metaphor, tone, emotional charge of specific words, which give a poem its multifaceted quality. PoemViewer generates time-dependent visualizations which depict such poetic components. For example, a rhyme can contribute to the reader’s anticipation of the next verse (moving forward through time) or it can make him recall a past occurrence of the same acoustic structure. Within this tool, the poem becomes a multi-dimensional variable space in which movement of sound is traced – the poem is associated to a “flow of sound”.
The readers may choose which linguistic component they want to track and also have the option of viewing its evolution independently or in relation to another component (see Fig.1.). For example, one can assess how different phonemes relate and influence each other within the poem.
Such a way of analysis extends the possibility of research from a single poem to poems within different literary epochs. This would make way for studying anything from psychological patterns of writers to even poetry’s response to historical events.
A second, very intuitive and original approach came from two researchers from UC Santa Barbara, United States of America, who developed an application called VizOR, which generates a visualization of the narratives in the novel of Mark Z. Danielewski’s 2006, Only Revolutions.
The novel benefits form a very special structure: both covers seem to be the front cover and form whichever end one starts reading it, the novel will uncover either Sam’s or Hailey’s story. Each page contains some upside-down text, which is a part of the other character’s narrative, at a more advanced point in the story. Also, complex numerological subtleties appear in the choice of words, names, paging and text structure (the recurrence of 8 and its multiples). Having two clearly defined narratives constructed upon a strong symmetry and raw data patterns, Only Revolutions is a perfect subject for testing and developing an application like VizOR.
The tool is programmed in Python and manipulates a MySQL database containing the complete text of the aforementioned novel. The database is designed in such a way that it allows highly flexible queries which result in specific visualizations of the narrative. The tool allows the user to “query a specific word of a particular character’s narrative or chronology, the text from a specific line of a character’s narrative or chronology on a specific page, or the narrative or chronology text from a whole page for a specific character” .
The visualization is intuitively formed by three circles, a big one encompassing the two smaller ones which correspond to the narratives. The user may browse through the smaller circles and stop when finding a paragraph of interest. Simply clicking on it will generate a database query resulting in the appearance of the corresponding paragraph in the other character’s narrative. Also, users can receive immediate feedback with respect to certain terms, by simply hovering over them and choosing specific words or lines. Clicking on the chosen item makes the circles rotate, producing a realignment of corresponding terms.
The visualization of such queries facilitate the discovery of hidden patterns, giving rise to new possibilities of interpreting the novel. The application appeals to the human visual intuition and provides the reader with a deeper understanding of the artwork.
Stylometry with R
With regard to the more technical aspect of literature, a suite of scripts have been developed using R, which implement mechanisms from the realm of stylometry. Stylometry means determining linguistic style in written language, often with the purpose of attributing authorship to anonymous or controversial writings.
The suite is composed of 5 scripts: Stylo, Classify, Rolling Delta, Oppose Test and Keywords. They work by analyzing the stylistic aspects of a series of input texts and providing results in both numerical and graphical form. Stylo is the main tool and it combines sophisticated algorithms of classification and clustering with an accessible interface. There are many functions which the scripts accomplish, such as: listing the words in the input texts along with their frequencies, normalizing frequencies, producing visualizations of bootstrap consensus trees. Also, it explicitly supports 9 languages.
The suite encompasses procedures that improve the attribution process, such as automatic deletion of personal pronouns and culling. The Delta script can be used in analysis of collaborative works in order to determine authorship of text fragments.
Besides the few functionalities mentioned above, the suite provides the user with many other advanced analysis mechanisms, specific to quantitative comparative linguistics. A possible visualization produced by the Stylo script can be seen in Fig. 2.
All of the aforementioned features plus the user-friendly interface, make this suite of scripts a very powerful tool for scholars who want to manipulate texts and unveil particular technical and stylistic aspects of them.
The three applications very clearly suggest that visualization of data is the way to go, in a world in which literature has become more of a puzzle solving activity. While the R scripts and PoemViewer are mostly destined for expert use, VizOR is a bit less demanding with respect to prior knowledge about literature or text analysis. It can be easily used by the curious reader for a better understanding of the underlying symbols and symmetry which structure the novel. On the other hand, PoemViewer and VizOR are better suited for a qualitative analysis of texts rather than the R suite, which performs a quantitative analysis and generates statistics. Also, R and PoemViewer have the potential of performing intertextual analysis.
All of the abstracts summarized above prove that in an era in which the volume of literature has exploded, computational tools are needed for time-efficient analysis. Visualization of data can be pinpointed as a trend when talking about interpretation of literature, either from a simple reader’s point of view or a specialist’s. With the passage of time, it will surely become a necessity.
 Abdul-Rahman, Alfie, Katharine Coles, Julie Lein, and Martin Wynne. “Freedom and Flow: A New Approach to Visualizing Poetry” (n.d.). http://dh2013.unl.edu/abstracts/ab-143.html.
 Solomon, Dana Ryan, and Lindsay Thomas. “VizOR: Visualizing Only Revolutions, Visualizing Textual Analysis” (n.d.). http://dh2013.unl.edu/abstracts/ab-255.html.