, , , , , , ,

The rapid growing of computing power and memory capacity opened for scientists new branch of research: Big Data analysis. Powerful tool allows to work with huge amount of information in a short period of time. This tool was a privilege of scientists and high-tech companies, however during the time this concept was evolved and moved towards humanistic science and ordinary people. Many questions and opportunities were rising with a such changes. This is optimization of human-machine interaction, data visualization, new areas of research, and etc.

Text analysis is complex and time consuming process, as an example of such analysis is – stylometry technique. Studying hand written text of middle century takes time, and now imagine you have several hundreds of thousands texts, books. This looks unreal even you have enough time people and money. But often you have nothing of these. Text analysis is a good example where Big data concept can be implemented and this has already done. There were research of Patrologia Latina collection in order to build a system that can identify hidden texts of authors.

Books with known authors were analyzed in the first part of experiment according to criteria like: frequent words, lemmata and etc. After a set of “training experiments” of author identification on a small group the main experiment was launched. Using the same criteria, the final experiment shows the system can guess maximum 50% of author. The future optimization of identification criteria can increase the performance of the experiment and as a consequence discover huge amount of anonymous texts and books that are stored worldwide. [1]

Many innovations were integrated into society like:phone lines, internet, social networks and etc. Big data does not an exception from this trend and currently it is on the way. There are servers allow people use this technology in their research area but because of luck of technical skills or their desire this is highly problematic for them. Using a flexible platform and native application programming interface scientific groups trying to eliminate this gap between users and system. Example of a such project is National eResearch Collaboration, Tools and Resources (NeCTAR), Australia, where was reached the balance between simplicity and opportunity of the system. In other words they combined intuitive interface with all functions of Big Data concept so this tool is available for a number of humanities research projects. [2]

Modern science strongly suggests, that the information of objects much easier perceive by image . About 80% of all new information is processed by human eye. It turns out complex multilevel information easier to recycle, receiving it as a picture. Picture as a means of communication is very convenient. It can be very information-intensive: through various parts to pass a set of values, meanings and shades. When we speak about Big data, this means thousands and millions of elements, even using picture presentation of information it is still complicated to analyze for human. Results of research should be well readable and allow to make hypothesizes based on them, the question of the most suitable information representation is arising. You should take into account that all experiments include “noise” data and from Big data point of view, this is big headache. Andrew Goldstone’s and Snyder (2013), proposed to work with big data using several tools of data visualization in order to eliminate “noises”. High level of visualization flexibility and detalization is criteria for new concepts for data representations. [3]

As we see the concept of Big Data evolves and goes from scientist to society becoming a powerful tool for people thought image visualization as the easiest way to present the vast information. It will become more adoptable for people without technical background due to “friendly interface ”, support systems and etc. No doubts this concept will have influence on many aspects of society life.

1. Maciej Eder, et al. Taking Stylometry to the Limits: Benchmark Study on 5,281 Texts from “Patrologia Latina”. DH2015, Sydney, Australia, June 29 – July 3, 2015.

2. Jonathon Hutchinson, et al. Social Media Data: Twitter Scraping on NeCTAR. DH2015, Sydney, Australia, June 29 – July 3, 2015.

3. John Montague, et al. Exploring Large Datasets with Topic Model Visualizations. DH2015, Sydney, Australia, June 29 – July 3, 2015.