In this blog, I will discuss three articles about building and using different interactive visualization tools for discovering hidden information behind massive data.

The first article is titled “Exploring Large Datasets with Topic Model Visualizations”. Their work is inspired by philosophy journals from JSTOR. It describes a topic model visualization solution dedicated to exploring large datasets consisting of many entries over a long time period with large set of topics. The authors first summarize the status quo of topic model visualizations and suggest which one to choose in different applications. In their general model visualization tool, it is implemented in 3D form, focusing on allowing users to explore large datasets to reveal new information, instead of only comprehending and analyzing a dataset. In this tool, they merge multiple existing visualization solutions into a single faceted browsing paradigm for exploration and analysis of document collections. Also it provides zoomable graphical user interface presenting word clouds generated by topic modelling, histograms, line graphs, and network diagramming.

The goal of this article is to show how attractive it is to apply techniques of topic model visualizations to large datasets and thereby provide more targeted affordances for exploration. Topic model is a statistical model to discover a hidden topic behind a document, which is useful for natural language processing. The intuition for this model is quite straightforward. The words in the topic tend to appear more frequently than other words in the document. For example, in this blog, I am pretty sure the word “visualization” will appear much more often than other words (except “a”, “an”, “the”, “to”, …, etc.). Would you like to bet?

As it can be now fast to explore large datasets, it would be necessary to know how to visualize linked and related data, thus enable people to retrieval information more efficiently. This is well illustrated in the second article from John Simpson, entitled “Building Better Linked Data & Ontology Visualization Tools”. It describes what features a semantic web visualization tool should have to maximize the discovery of new information. To answer this question, firstly, the article reviews and evaluates 30 existing semantic web-related visualization tools. Secondly, it reports a completed network visualization tool which can be used by non-expert users to explore linked data. It is an ontology visualization that is both exhaustive and understandable. It is printable but also provide intuitive ways to interactively explore ontology. Besides, there are other expectations that are met, such as extraction and clear representations of hierarchy and predicates.

Ontology is an explicit specification of conceptualization. Generally, it is not easy to construct ontology visualization. However, it seems they have built a quite comprehensive linked data visualization tool for large data, even they haven’t reported on the results of the user testing as well as their insights and recommendations.

In terms of applying visualization tools to discover hidden information behind “big data”, there are also many interesting projects. One example is as larger and larger archives of human cultural output are accumulated, the deluge of information has become an increasingly frustration for historians. The goal of the third article entitled “Everything on Paper Will Be Used Against Me: Quantifying Kissinger” is to tackle one of this problems. This research visualizes documents related to Kissinger, frequency of words (laughter, bombing, etc.) in the Kissinger memcons, and so on. The examinations of this project not only enable us to deeper understand Kissinger’s foreign policy, but also his personal emotional motivations. Combining computing and emotional history approaches, we see more insights about this man and the geopolitical focus of the administration he served.

Visualizing big data in humanities, is an important and demanding topic. When the powerful visualization tools is well applied in exploring massive history, cultural and humanity data, I think we will come to more and more useful findings. Also, rather than pure researchers, more ordinary people’s interest will be triggered, which will be a good phenomenon for the development of less popular subjects.


J.Montague, J.Simpson, G.Rockwell, S.Ruecker, S.Brown. Exploring Large Datasets with Topic Model Visualizations.

J.Simpson, S.Brown, J.S.Elford, S.Murphy, M.Brundin, R.Warren. Building Better Linked Data & Ontology Visualization Tools.

M.Kaufman. Everything on Paper Will Be Used Against Me: Quantifying Kissinger.