Authentic Data, content analysis, corpora and corpus activities, data mining, Digital Corpus Resource, English, French Studies, French Text Messages, Information Retrieval, Interdisciplinary Collaboration, Interface And User Experience, linguistics, Mediated Electronic Discourse, natural language processing, SMS, text analysis, text mining, topic modeling, visualization
Technology has become a valuable tool in many fields, one of which is text analysis. The way in which information is expressed can provide valuable information. Understanding the way people present information in the written word can be extremely valuable, however analysing such vast amounts of information would take a lifetime without the help of technology. Therefore, research on how to analyse text has become highly valuable and thought after. There are many different ways in which text analysis can be performed using technology, however the articles chosen for this assignment looked at three such methods. The articles looked at studies of text analysis using either a visualization tool, a topic modelling algorithm or the exclusion of transcoding.
Pockelmann, Medek, Molitor, and Ritter (2015) investigated whether an interactive visualization tool could be used in the investigation of text genesis as this is of particular interest to certain fields, such as sociology, philology, and history. It was suggested that differences found within manuscripts should be viewed in a visually appropriate way in order to allow scholars to explore those differences in a more effective manner. Medek et al. (2015) proposed the Colored & Aligned Texts view (CATview) in order to allow scholars to explore text differences in a more effective manner, as well as aiding the editing process. It was found that CATview was a successfull interactive tool in text comparison due to its ability to facilitate the exploration and navigation of the differences of multiple texts and presenting them in a graphical overview.
Alternatively, Panckhurst (2015) built on previous research of the sms4science initative and focused on a digital resource that excludes full transcoding and standardised tagging, 88milSMS. Its aim was to add to the creation of a worldwide database that would allow the analysis of authentic text messages. It was suggested that by excluding these phases a more neutral form of annotation could be applied in order to allow researchers to explore their scientific enquiries without the constraints of a single theoretical framework. Panckhurst (2015) argues that this new digital corpus could shape the future of text message analysis. Although statistical data was not provided, the theoretical underpinning behind 88milSMS provided a convincing argument for its importance in text message analysis.
Furthermore, topic modelling of french crime fiction was researched by Schöch (2015) in order to explore whether there were patterns that were topic-related. It had already been suggested that topic modelling could be a useful method to investigate the history of french crime fiction, however, this study aimed to answer more specific questions, such as prevalence and relations between topics. Interesting results were found with regards to topic-related patterns and provided a new perspective into the history of french crime fiction, as well as new avenues of research into topic modelling.
It is clear from the articles summarized above that text analysis using various forms of technology has evolved immensely. Each article provided convincing arguments for the use of these methods, as well as highlighting the importance of technology in text analysis and providing future avenues of research.
 _CATview_ – Supporting The Investigation Of Text Genesis Of Large Manuscripts By An Overall Interactive Visualization Too. Marcus Pöckelmann, Martin-Luther-University Halle-Wittenberg, Germany; André Medek, Martin-Luther-University Halle-Wittenberg, Germany;Paul Molitor, Martin-Luther-University Halle-Wittenberg, Germany; Jörg Ritter, Martin-Luther-University Halle-Wittenberg, Germany
 ’88milSMS’, A New Digital Corpus Resource Of French Text Messages: Why We Chose To Exclude Full Transcoding And Standardised Tagging. Rachel Panckhurst, Praxiling UMR 5267 CNRS, Université Paul-Valéry Montpellier France
 Topic Modeling French Crime Fiction. Christof Schöch, University of Würzburg, Germany