, , ,

Language is one of the oldest and most distinctive achievement of the homo sapiens. None of the other species has this ability to convey ideas and concepts through a form of communication like spoken or written language. Since language is such an old conquest and is used to express personal ideas, it’s inevitable that concepts, meanings and forms can vary substantially through history and through people of the same era.

1. Ortography variations

The paper by Dustin Grue [1] starts exactly from the analysis of the latter aspect: the form, in particular orthographic variants between British and American language. The starting point is the result obtained by a previous work by Haeffernan et al., who endorsed the duo orthography/identity. This hypothesis posits that speakers choose variants in order to express their identity. The research carried out by Haeffernan, using data from the students newspaper of the University of Alberta, Alberta (Canada), showed that in period with high level of “anti-American sentiment”, British variants were used far more often. Dustin Grue, in his paper, tried to replicate this result using instead data from a newspaper of the University of British Columbia (confining with Alberta, Canada). He found that such correlation was not statistically significant, undermining the relation orthography/identity. The alternative hypothesis proposed is that we have also a proximal linguistic context factor which drives orthography variants selection. The approach was the same as for word sense disambiguation, which means distinguishing between ambiguous meanings by using a set of features and a computational model. In this case features were the surrounding context, represented by a window of 8 words before and after the one analyzed. In order to get significant results, Grue used statistical tools like NaÏve Bayes and Estimation Maximization. The results spotted an historical effect: with the spread of globalization, American variants dominate in global advertisement and the like contexts (ex. British colour vs. American color). In general Grue found that British variants are dominant when the activity in question is economically or socially local, instead American variants predominate in more generally contexts (Table 1). Grue concluded stating that context shouldn’t be considered as the only explanatory variable: contexts are motivated by ideological and interactional reasons as well.

2014-10-21 20_05_40-dh2014_abstracts_proceedings_07-11.pdf - Adobe Reader

Table 1

2. Semantic Variations

In the paper of Munson Matthew [2] the focus is not on how we choose different variant of the same word, but on how to track historical variations in the meaning of the same word. Even if the author’s aim is different with respect to the previous work analyzed, he used a very similar approach: extract the meaning of a word looking at a window of words embracing the one we want to examine. Moreover, even Munson used statistical analysis in order to get significant results and we can also find the same reference as Grue’s for these models (i.e. Christopher Manning and Hinrich Schütze (1999), “Foundations of Statistical Natural Language Processing”). In this particular case this author used co-occurrence patterns on large-scale corpora. The method has been applied to two corpora: Greek Old Testament (the Septuagint) and the Greek New Testament. The results give an idea not only about which words experienced the greatest semantic variation , but also how they changed. In the case of the two Testaments, considering the words most related to God, emerged the very significant fact that the idea itself of God has changed between the two corpora: from a potentate who rules and makes wars to a patron exchanging favors with his clients.
2014-10-21 20_03_42-dh2014_abstracts_proceedings_07-11.pdf - Adobe Reader

3. Etymology

As for the last paper, I chose a work with a different approach to the problem of changes in language: etymology. In this case the subject examined is not the semantic variation during history, but the words history itself and their origin. Jonathan Pearce Reeve in this paper suggests that etymology of words in texts can be indicative of the context or the level of the discourse. In this sense we can see a first connection with the idea behind the paper by Dustin Grue [1]: language and context are related. What is changed is the point of view: orthography versus etymology. The technique used by Reeve consists of calculating proportion of origin languages for all the words of a given text, through an ad hoc written program: the Macro-Etymological Analyzer. The results allow to recognize stylistic properties and pattern in the text. The first evidence suggests that learned text and government documents contain a lot of Latinate words, much more then romance or adventure novels. Hellenic words instead achieve high proportion for religious language. The Analyzer has been used also with literary works. The findings showed that the language and style used by characters are adapted to the their education level. For example we can find that Stephen, protagonist of Joyce’s A portrait of the Artist as Young Man, increasingly used Latinate word along his growth, entailing the achievement of more mature language. In Virginia Wolf’s novel The Waves we can see that educated characters have the highest proportion of Latinate words.
I reckon that we can see here a sort of reconciliation between the two trends examined in the first paper by Grue [1]. A reconciliation achieved through a completely different approach. In fact we can see that language not only is chosen with respect to the context (for example Latinate words for governments documents), but is also chosen in order to reflect personal identity (in this case the identity of fictitious characters of novels). I want also to highlight that the Macro-Etymological Analyzer is a web app freely accessible at http://jonreeve.com/etym. I used it to analyze this text and the results are the following:

2014-10-21 19_22_07-jonreeve.com_dev_etym_etym.php

The web page of the Macro-Etymological Analyzer

2014-10-21 19_21_27-jonreeve.com_dev_etym_etym.php

Results of the MEA for this article

Ultimately, these three papers give us an idea on how digitalization can help us not only in tracking changes in languages, but also to understand how they work and why we can use such information in order to understand something about the speakers of these languages. This work can be approached by a variety of point of view, in the cases presented there are in particular orthography, semantic and etymology. Even if we have different starting points we found connections in the results, suggesting a multi-discipline approach, which could be carried on thanks to the diffusion of ever more digitalized and accessible data.


[1] Grue, Dustin. Does colour mean color?: Disambiguating word sense and ideology in British and American orthographic variants

[2] Munson, Matthew. Tracking Semantic Drift in Ancient Languages: The Bible as Exemplar and Test Case

[3] Reeve, Jonathan Pearce. Macro-Etymological Textual Analysis