, , ,

Big data analysis is one of the most powerful technique to help people get complicated things done (the link http://en.wikipedia.org/wiki/Big_data gives you an introduction about big data). You don’t think so. Well, no one would deny the fact that campaigning for presidential election is regarded as one of the most difficult jobs in the world, so check out the story in United States of America 2012 presidential election. Barack Obama’s campaign team successfully utilized big data  analysis to allocate campaign resources more efficiently and win the election, the report can be accessed in the following link, http://www.bloomberg.com/video/how-did-big-data-help-obama-campaign-eGYfo5P8Qd2hoCjcesaF5Q.html. If you are convinced of the statement now, we could try big data analysis in the filed of humanities research.

The first paper presents an approach to analyze authorship attribution to “Dream of the Red Chamber (DRC)”, which is one of the greatest Chinese novels. For those who have no idea about the novel,  please check the link, http://en.wikipedia.org/wiki/Dream_of_the_Red_Chamber. Researchers wonder whether the novel was written by more than one author. And there are two arguments. One is that the novel was totally written by one author and the other is the first 80 chapters were written by one man and the remaining 40 chapters were by another author. In this paper, authors not only design a text mining function but implement traditional approaches such as term frequencies in each chapter in the novel to solve the problem. And the paper supports the second argument that the first 80 chapters and the last 40 chapters were written by different people. They also show that chapter 64 and 67 may be written by another person other than the two authors researchers are used to know.

The second paper proposes a text analysis smartphone application, Textal, developed by University College London (UCL). Textal in now free to download at http://www.textal.org/ or on twitter at @textal. With the trend of the increasing number smartphone users, the authors have the confidence that they are the first to build the stand-alone software from scratch for a wider audience to be able to enjoy the beauty of text analysis in digital humanities. For the purpose of bringing the software to the world, they intend to publish it in many languages. Textal provieds users a visually oriented interface which includes word-clouds, graphs, charts, and word lists, and can be available online or via social media. Since UDL owns the infrastructure to the software, they are able to track the records of users’ text analysis to demonstrate them as case studies in the other papers.

Citation is one of the critical tool to realize how important a research paper will be. Citations, however, is not as easy to apply in humanities area as in science and technology for several reasons. One is that citation data is not easy to get in humanities scholarship, and rare older sources, which researchers in humanities depend on a lot. The second reason is about cocitation. Studies show that humanists reference papers and are more likely than scientists to apply integral references, but say little about relationship between their views and those references. Besides, humanists seldom publish articles  with multiple authors and do not credit peers as frequently as scientists do. In the third paper, authors propose and online citation extraction tool and a classifier to solve the problem. The tool examines citation and return how many times a given reference are and where they appear in documents. The classifiers put each citation on a positive or negative scale, which demonstrates if a specific sentence or documents agree or disagree with a given topic. Combining the tool ad the classifier to examine frequency, location-in-document, and polarity, the scale of positive or negative views, in any given documents, researchers can easily relate their works with the references.

All of these papers demonstrate the big data analysis to help humanists do research. Textal provides a convenient tool to do text mining anywhere by your smartphone. And the results can be accessed online in graphs and charts, which help beginners interpret the results more quickly. The first paper uses term frequency to figure out the authorship attribution problems in a greatest novel. The term frequency is a straightforward rule in text mining. If the authors can take the ideas of location-in-documents and polarity which are mentioned in the third paper into account, they might solve the third authorship problem they encountered in the research. We all know, however, Chinese sentences and articles are written in a totally different way, the ideas proposed in the third paper may or may not assist in solving the authorship problem in a Chinese novel. One thing we are sure about is the software developed by UCL is rather useful if multiple language interface is available.