, , , , , ,

Marilyn Monroe, Harry Potter, and Cristiano Ronaldo – when talking about popular cultures in modern society, it’s universally acknowledged that the focuses and interests of people vary greatly between culture circles and evolve rapidly with time. The study of popular culture, thus, is always troubled by uniqueness, instantaneity and lack of methodology. However, thanks to the development of data science and its application in digital humanities, we are now able to catch the underlying information behind those culture phenomena in a convincing and robust fashion.

Let’s start with National Football League (NFL), the most popular sports league in the United States. Nowadays, the NFL draft, which allows teams to select young players from college level, has become a much anticipated annual event for NFL fans. For them, draft prediction is just an apéritif before they enjoy the show; however, we could learn more about the business from the prediction with the help of data mining. A study [1] was carried out to analyze the accuracy of NFL draft predictions from the best pundits of this league. Predictions were compared with the draft outcome, and then quantified and visualized in the form of Pearson’s correlation coefficient. We can tell from the result that, regardless of being made by independents or databases, most of the predictions have relatively similar accuracy in the first round. Starting from the third round, the independent predictions begin to fade. The most interesting result comes from the seventh draft round; two cluster of predictions that respectively share the similar accuracy emerge here (see Fig.1). By this point we could assume that all predictions still ‘in competition’ are supported by a certain scouting database, so thus we could further infer that within each cluster, all predictions refer to the same database.


Fig.1 Accuracy in round 7 – two clusters emerge here

In the end, the study manages to find out the exact database that supports each cluster of predictions. Noticeably, professional sports is a business full of data. In Major Baseball League (MLB), another main sports league in the US, a statistical approach has been adopted by Oakland Athletics since 2002 to build their team. In that season, they had one of the lowest budget in the league, but ended up winning their division with a 20-game winning streak. This story was documented and made into a film named Moneyball [2]. Now the concept of Moneyball is everywhere and more and more elite teams from all kinds of sports would hire a data analyst to assess players, opponents and transfers.

Now we turn from the art of handling data to generating useful and valuable data. Having a dramatic storyline and attractive main characters is not enough for a good video game, fiction or movie. Good works make you feel so real in the fictional universe because they have fruitful subsidiary characters, rich backgrounds and a plausible logic. However, it’s highly demanding and expensive to generate these details in traditional methods, that is, everything is developed by professional writer manually. A tool [3] based on Bayesian model was developed to generate subsidiary characters for fictional universe. This tool, written in Inform 7 (a programming language for interactive fiction), builds a character starting from birth place and natural features, then professional and life-changing events successively, which all results in a set of interactive experiences conforming to its features. With a set of randomized bottom features and a suitable data model as the quality-controller, it manages to produce features and experiences of higher complexity that accord with the set-up and logic of the fictional universe. In this study, the Game of Thrones Universe was taken as the example, and the auto-generated characters received general acclaim in the survey, even among interviewees who are pretty familiar with this universe.

The last topic we will present here is the data visualization in social media. The last decade witnessed social media heading into its prime, and it has already become a regular part of any popular culture study. Thus, processing the data from social media, which is labeled as mega-scale, unorganized and diverse, needs a visualization method to extract and represent valuable data. For example, a geographic information system (GIS) ontology [4] has been employed to track the 2014 Bloomsday in Dublin, Ireland, which was set to commemorate Irish avant-garde writer James Joyce and his novel Ulysses, one of the best stream-of-consciousness novels in the history. Inspired by the schema of Ulysses, this GIS model employs some key elements in the novel such as time, symbol and color to stratify the data from Twitter, Flickr and Youtube. To establish a digital eco-system, the structured data would be visualized on the Google Map and we could now see the cultural flow throughout the day. This method is a riveting attempt because it does not only visualize the data, but also apply the Linati schema, created by Joyce to help conceptualize his work Ulysses, into the GIS ontology.


Fig.2 Visualization of social media on Bloomsday

In a word, data science has shown its mighty power in the study of popular culture. The key to success is the ability to overcome the disorder of information, a highly identical feature of popular culture. It provides insights which the text on the surface doesn’t tell us, saves people from endless iterations in art and literature creations, and, for sure, many more than these possibilities. If there should be some caveats, it would be safety issues. As the popular culture nowadays is open for everyone to leave a footprint, we all become data contributors. Celebrities are no longer alone to keep an eye on personal privacy issues. Therefore, digital security and cryptography, take a wild guess, might make a giant leap forward soon in digital humanities.


[1] Harvey Quamen, Matt Bouchard, Andrew Keenan. “On the Clock”: Grading the NFL Draft Pundits. 2015: “On the Clock”: Grading the NFL Draft Pundits

[2] Bennett Miller, Brad Pitt, et al. Moneyball. 2011. Moneyball – Wikipedia

[3] Matthew Parker, Foaad Khosmood, Grant Pickett. Game of Thrones for All: Model-based Generation of Universe-appropriate Fictional Characters. 2015: Game of Thrones for All: Model-based Generation of Universe-appropriate Fictional Characters.

[4] Charles Bartlett Travis. A Digital Humanities GIS Ontology: Tweetflickertubing James Joyce’s “Ulysses” (1922). 2015: A Digital Humanities GIS Ontology: Tweetflickertubing James Joyce’s “Ulysses” (1922)