One of the very important means of digital humanities research is text mining. With the proliferation of digitization of written documents, the amount of text to be analyzed has been increasing rapidly. This rapid increase of digital text corpus brought the development of powerful text analysis software aiming to fulfill a variety of needs. In this post we try to introduce two affluent tools for text analysis, which are Voyant Tools and PhiloLogic.
As also stated on the homepage, Voyant Tools is a web-besed platform which contains different text analysis tools. The main page shows a text box for the user to enter the text which can be done by giving a url or the whole text. Here the user can enter more than one documents so that Voyant Tools can be used to work across the document. With the digital texts entered, one can start the analysis by using the tools.
The tools are designed in modular fashion and are easy to use. One of the handy tools of Voyant Tools is the Cirrus. Cirrus tool is to display the words occurring within the text while signifying the frequency of occurrance by using bigger fonts. While using this tool the user can define “stop words” which means the words that is not to be considered during this process. Such words for English would be “the”, “and”, “but” since they are used a lot in texts but does not necessarily hold information about the text itself. There are predefined lists of such words for the English, French, German and some other languages. This tool can be used convey what the text is about in a concise and easy to read manner, thus it has been used by a variety of websites to guide the visitors. An example of Cirrus embedded to webpage to show the word frequencies can be seen on the pages of Digital Humanities 2012 Conference in Hamburg website. Below is the Cirrus generated for the dh101.ch website.
As the tools are designed in modular fashion, they are also interactive with each other. For example when the user clicks on a word appearing on the cirrus, more information about this word is displayed, such as the positions within the text, the relative frequency of the word within segments of the text. Besides that, Voyant Tools show the user which words are used in the text and how many times they appear. Furthermore, if more than one documents are given to the Voyant Tools, the user can study the similar or unique informations between the documents.
The other tool that we introduce in this post is PhiloLogic. PhiloLogic is a text search, retrieval and analysis tool designed specifically to handle large collections of texts. The tool is distributed as Free Software and can be downloaded from the PhiloLogic’s webpage.
Although PhiloLogic is also a tool for text analysis and has features similar to Voyant Tools, the design and the use of PhiloLogic and Voyant Tools somewhat differs. PhiloLogic is designed to be able to handle large databases of texts efficiently and have the ability to handle plain text, Dublin Core, DocBook formats and process XML data. With powerful XML support, PhiloLogic can store extensible bibliographic information about the corpus. Besides these, it has a MySQL back-end thus it can handle standard database queries. With all these features, PhiloLogic becomes a very powerful software to handle searchable online libraries that have large corpus. A very good example of a website using PhiloLogic is Chambers’ Cyclopedia where the digital scans of one of the first encyclopedias to be published in English language can be searched. Since the encyclopedia is printed in the first decades of the 18th century, the writing rules of English is different than today. However, PhiloLogic’s similarity function tools enables the user to conduct searches by using modern English spellings.
In conclusion Voyant Tools and PhiloLogic are powerful tools for text analysis. Although they both operate in the same field, their usages are different. While both provide the user with powerful text analysis tools, Voyant Tools do it with an easy to use interface that handles complex operations without requiring lots of effort, PhiloLogic gives the user the ability handle large databases of texts and do analysis on them.