, , , , ,


Digital Humanities ; Sound ; audio ; annotation ; archives ; Sonic feature

Sound is not enough represented in Digital Humanities. Other types of sources – namely written sources – have been the object of far more interest. Therefore we are still beginners in the art of integrating sound archives and studies to Digital Humanities, and lots of improvements need to be done to further progress in this field which is not well established yet. There are several reasons for this lack [1] : first of all, sound archive are quite new with respect to written texts or pictures. The first reproducible recording only dates from 1878 [2]. Because of the novelty of sound archives, there are neither tools nor well established methods to analyze them. In addition there are some issues due to the presentation of those archives or explanations in an appropriate format that would be understood and easily accessible for anybody. There are also constraints on these sources because a lot of them are copyrighted. Last but not least, sound archive analysis is far more complex and takes much more time than text analysis, because it contains more types of data that can be analyzed. I will provide examples of such data later.
Some solutions to these issues are still under development. I will split my explanation into two different parts. The first one concerns the problem of archiving audio files. The other one deals with the issue of analyzing these audio files due to their richness.

Archiving audio sources

How can we archive audio sources in a way that will enable anybody to find and understand them? We have developed systems that suit to texts or images, but there is currently no well-established archive system for sounds.

First, in order to find easily the audio file that we are searching for, it must be well indexed [1]. Thus its content has to be described. The problem is that a manual annotation is not manageable because of the volume of data. Therefore an automatic annotation has to be made, but we need to create a reliable tool for this. Even if speech recognition and computer vision are more and more advanced and help a lot, the low quality of some samples hinders the possibility of big data automatic indexation.

[3] Secondly if we want anybody to be able to understand an audio file, we need a new tool that will allow to both read and listen at the same time. If it is not the case, the explanations about the music are difficult to understand. A solution would be the use of the score however it is not helpful for non-musicians, hence part of the audience would be lost. We could add an audio file with the piece of music in addition to the article, but people would have troubles in finding the part of the song we are talking about in the text. Or we could incorporate a QR code (that would link up with the considered part of the piece of music) at the appropriate place of the printed article, nevertheless we would lose the audience who do not have a smartphone. To sort out this issue, Joanna Swafford from the university of Virginia has created the web site “Songs of the Victorians” (http://www.songsofthevictorians.com/) and another site where we can do the same as her without having to know how to program. Songs of the Victorians allows us to see the score while it is played (and the listened part is circled on the score), furthermore on another page we can read reviews about the music or lyrics, and at the appropriate place of the review there is an icon on which we can click in order to hear the concerned part of the song. As a result, we have everything : the complete score, the analysis of the music and lyrics, and the audio file, all of this being linked together in an appropriate way. This web site has two purposes : it is an archive, since it deals with music pieces that are hard to find or that are disappearing, and it can be used to teach to scholars even if they are non-musicians. Nevertheless, one can wonder whether such an analysis of the songs is complete or not and to which extent we can do the same with any time of audio record.

Analysis of audio sources

What can we extract from an audio file? Of course, it depends on the source. It can be the study of the nature noise with birds songs and wind blowing in the trees, or the study of daily objects sound like the clicks of a machine. If there is a voice, we can distinguish two different aspects [4] : the pheno-song which is to say the speech with the words, and the geno-song also called sonic feature which entails the volume of the voice, its pitch, its tone and its flow for example. Some humanists are considering that sonic features are not translating feelings, and that it is solely a support for pheno-song, which really transmits a message. Other humanists believe that geno-song are meaningful, but only if associated with pheno-song, due to a lack of system to describe and understand the message. : if we study the geno-song without the pheno-song, the sonic feature becomes a physical data without transmitting feelings. It is therefore difficult to draw information from the geno-song. To put it in a nutshell, the debate is the following : knowing that the pheno-song has a meaning, does the pheno-song associated with the geno-song make sense only thanks to pheno-song, or does the geno-song add something? It is certain that from a purely theoretical point of view the song is just another way to get a message (like paper is a support to write words). But from an artistic point of view there is something more. For instance if you feel that a music conveys happiness, it is not only related to the words that are pronounced but also because there are major chords and also the singer is laughing and not groaning. Even without listening to the lyrics you can perceive a message. A good illustration for this would be to you listen to a foreign language song and to try to guess the story that is sung. You will probably miss the point, but you can be able to detect if it was a particularly sad story for example. This means that geno-song has a meaning, even being much more implicit. Artists can play on sonic features to give a deeper meaning to the words they are pronouncing.

We also need to further investigate links between those sonic features. For instance characterize the differences between two singers that have performed the same song, and determine how those dissimilarities are linked to other factors such as the region or the age. We could develop a “cultural map” to show the influence of language on the history of the region, or reversely the influence of the region on the song. [4] Sonic features can give information about the context and the culture that pervades the song.

Considering all of this, we have to develop a tool that will be able to detect the different aspects of the voice and the music even the accent, the intonation, or the tempo, because we need a translation of the geno-song for example, and we need a tool to get all the explanations about the song, and not only the words that are pronounced in the record. Machine learning is playing a fundamental role for those new tools.

An example of tool that humanists can use is ARLO (Adaptive Recognition with Layered Optimization) software [5], thanks to which we can create a spectrogram (where we see the frequency as a function of the time) that enables humanists to extract the pitch, rhythm and tone. An example of application with a tool such as a spectrogram can also be the study of the noise that is induced by the fluctuations of the electricity supply in the microphone. [5] This is called ENF (Electric network frequency signature), and it is specific to the time at which it was recorded. We therefore understand that technological development has its importance in sound archives. By comparing the ENF of a sample which has an unknown date of record to known samples we can estimate the age of the unknown sample. This can open a debate concerning useless data. What can we qualify as useless? For instance signal noise is usually considered as a nuisance but for the dating application it brings a valuable information.

It is important to address issues concerning the analyze of sound and archiving them. Indeed, the field of audio can swarm of useful data for digital humanities. For example, in addition to providing information on what is concretely said in an audio file where there is a voice, it gives also information about the speaker with the accent, about the location and the date with the other sounds like the noise of the microphone or the street noise. An additional dimension is added to the message provided by the language when analyzing the emotion of the voice. The voice signal is undoubtedly changed when psychological or physiological disturbances affect a speaker, whatever the culture of the speaker is. Therefore, some studies have been done in order to determine the emotional state of the speaker by analyzing his voice. [6] The richness of a sound signal allows to extract a lot of information and it combines several disciplines, which can greatly enrich the digital humanities. In addition audio sources are undergoing a huge growth so it is primordial to treat all the new documents that are coming. To analyze and archive audio sources, we need to combine multiple abilities and domains such as creativity, sound engineering, machine learning, computing.
We have to reform the way we are conceiving digital humanities in order to include other types of experiences like hearing in addition to the sight. Why not one day also delve into archiving smell, touch and taste if the technology evolves in line with this. Digital humanities is a constantly changing field.


[1] Roeland Ordelman, Max Kemman, Martijn Kleppe, Franciska de Jong ; “Sound and (moving) images in focus – How to integrate audiovisual material in Digital Humanities research” ; http://dharchive.org/paper/DH2014/Workshops-914.xml
[2] Wikipedia ; “Enregistrement sonore” ; http://fr.wikipedia.org/wiki/Enregistrement_sonore
[3] Joanna Swafford ; “Integrating Score and Sound: “Augmented Notes” and the Advent of Interdisciplinary Publishing Frameworks” ; http://dharchive.org/paper/DH2014/Paper-330.xml
[4] Tanya Clement ; “Developing for Distant Listening: Developing Computational Tools for Sound Analysis By Framing User Requirements within Critical Theories for Sound Studies” ; http://dharchive.org/paper/DH2014/Paper-854.xml
[5] Tanya Clement, Kari Klaus, Jentery Sayers, Whitney Trettien, David Tcheng, Loretta Auvil, Tony Borries, Min Wu, Doug Oard, Adi Hajj-Ahmad, Hui Su, Mary Caton Lingold, Daren Mueller, William J. Turkel, Devon Elliott ; “<audio>Digital Humanities</audio>: The Intersections of Sound and Method” ; http://dharchive.org/paper/DH2014/Panel-817.xml
[6] Robert Ruiz ; “Analyse acoustique de la voix pour la détection de perturbations psychophysiologiques ; Application au contexte aéronautique” ; http://blogs.univ-tlse2.fr/robert-ruiz/files/2012/02/Synth%C3%A8se-des-travaux5.pdf