Crowdsourcing is a term that enjoyed a recent and rapid rise in popularity, since 2006, when it was first mentioned by Howe [1]. The trend of searches of the term on Google search engine can be seen below (the peak in March 2014 corresponds to a crowdsourcing effort to analyse satellite images to find clues of the crash of the Malaysian Airlines flight MH307).

Google Trends

Popularity of the term ‘crowdsourcing’ on Google (https://www.google.com/trends/explore#q=crowdsourcing)

The term however encompasses a very broad range of different forms of public participation, such that it can include anything from a large-scale collaborative effort like Wikipedia to business models to get free or cheap labor.


No post would not be complete without an xkcd comic. (https://xkcd.com/1060/)

This has led many humanists, as is customary when confronted with something new, to attempt to define it and categorize it [2] — researchers have even categorized forty different definitions of ‘crowdsourcing’ [3]. But from a more pragmatic point of view, crowdsourcing has become a very powerful tool in digital humanities. In this class of digital humanities, given by prof. Kaplan, different collaborative experiments were put in place, from doing simple sorting or research tasks on Framapad to a variant of Chinese Whispers where we had to repeatedly transcribe a message. Inspired by this, I would like to investigate the relationship of crowdsourcing with the digitalization of libraries.

The massive digitalization of books started a couple years ago with many different projects, such as The Internet Archive, Europeana and proprietary formats such as Google Books. After millions of books digitalized, humanists started wondering “well, now what?” [4]. Far more books that could possibly be read in a lifetime are now readily available anywhere with an internet connection.

A attempt to answer this question is presented in What Do You Do with a Million Readers? [5]. Alongside the explosion of digitalization of literature, there has been a parallel explosion of comments, summaries and reviews online by readers.  The authors present an application of crowdsourcing to analyze the relationship between characters in works of fiction. By an automated study of reviews on Goodreads about certain books they were able to trace the relationship that exists between the characters in a specific book. The result for The Hobbit by Tolkien is illustrated below. The software was able to deduce the actions that certain characters perform on others, according to the reviews.

Character-relationship graph of “The Hobbit”.

The readers can act together to enrich content, as illustrated in [2]. One example of this is provided by Hashimoto [6] for the Kindai Digital Library, an online collection of out-of-copyright books published in Japan during the 19th and 20th centuries. After realizing that there was a public interest in the books and that many readers were discussing and exchanging ideas and knowledge on different mediums, the project KinDigi Social was born to collect and provide a platform for this user-generated content. The readers therefore become active participants to the archive by annotationg the text, sharing their knowledge and collaborating with scholars.

But can the crowd be also used to perform complicated task that require specific skills? Hakkarainen [7] refers to this as nichesourcing and poses these questions in the context of a project of the National Library of Finland, the Digitalization Project of Kindred Languages. The task is to correct and annotate a large number of monograph titles and newspaper articles in published in the Soviet Union during the 1920s and 1930s. These are written in 17 different Uralic languages, some of which are considered endangered or close to extinction. These documents witness an era of renaissance for these languages, as orthographic and lexical norms are renewed with this proliferation of popular literature. Therefore the challenge is to correct the digitalized versions with linguists or people knowledgeable with these languages, which are few, scattered around Russia and may not have an internet connection. Time will tell if these challenges can be overcome by an interplay between individuals with different skill sets in different language communities.

To conclude, the digitalization of libraries has introduced a paradigm shift: digital libraries are not static entities but are living organisms. The online collaboration between readers and scholars alike fosters the growth of the digital library with users’ knowledge and connections to further resources. Furthermore the crowdsourced annotation and correction of millions of digitalized books itself becomes a resource that can be studied by the digital humanities.


