, , , , ,

By Jorge Lagos

Musical stylometry is a relatively new field of study that aims at applying quantitative methods to the analysis and characterization of the style of music composers, on the basis of the characteristics of their compositions. Important applications of these techniques include disputed authorship attribution and musical genre classification, but they can more broadly be applied to the analysis of the musical style of a given composer or, more often, to the differential analysis of the styles of groups of authors.  This field has experienced a huge expansion in the last decades, partly due to the increased possibility of applying intensive data mining techniques at low-cost using conventional computers.

In this post we present an overview of the history and state-of-the-art of this field, as a first step towards the application of these techniques for the stylometric analysis of Jean-Jacques Rousseau’s music.

A little history

One of the first examples of statistical techniques applied to musical analysis appeared in nothing less than the work of the linguist George Zipf, who in 1949 reported an analysis of Mozart’s “Bassoon Concerto in Bb Major” as an example of a system following his very well known “Zip’s Law” [1]. This is an empirical law which states that the probability of occurrence of items in a collection (e.g. words in a book) starts high and vanishes off according to the power law p[f] = 1/f (where p[] stands for probability and f is the frequency of occurrence), meaning that a few items will occur very often while many others occur rather rarely. Zipf first observed this trend on the analysis of natural language, but soon discovered that it applied to a very broad range of phenomena, naturally occurring and not, and in fact his law has been verified time and again in many fields since his original proposition. In his Mozart example, Zipf manually counted the number of intervals of a given length between the repetition of all notes in the piece, and also the number “steps” of a given size between consecutive notes, and then he plotted the associated histograms in order to produce what is known as a “frequency-rank distribution” plot. The resulting distributions showed the expected 1/f trend, which in a log-log scale plot appears as a straight line with negative unit slope, as shown in Fig. 1.


Fig. 1. – Excerpt of G. Zipf’s results (1949) [1]

Another seminal work is the one carried out by Wilhelm Fucks in the sixties [2]. He was one of the first scholars that studied the differences in musical style among various composers by employing statistical methods. In particular he analyzed the pitch histograms of pieces by Beethoven, Webern, Strauss and Berg, and noted evident differences that he numerically quantified by means of the kurtosis of the resulting curves, as shown in Fig. 2.


Fig. 2. – Excerpt of W. Fuck’s results (1962) [2]

Fucks was also the first to apply quantitative techniques to study the evolution of musical style metrics over time. By plotting the standard deviation of pitch in several compositions corresponding to different time epochs, he was able to distinguish a clear trend across time, as reported in Fig. 3.


Fig. 3. – Excerpt of W. Fuck’s results (1962) [2]


The foremost works in musical style analysis appearing in the last decade are now reviewed. Unsurprisingly, all of the approaches are based on data mining techniques. In particular, classification methods are used, using either in supervised or unsupervised machine learning. These methods are geared towards trying to find hidden structures in sets of unlabeled data [3].

Manaris et al. (2005)

The ideas proposed by Zipf inspired an international group of researchers to perform musical genre characterization and authorship attribution experiments on the basis of Zipfian metrics [4].

In a first experiment, several attributes were extracted from a corpus of 220 MIDI pieces from the Classical Archives online database. The attributes included common musical attributes like note pitch, rests, duration, harmonic and melodic intervals, chords, etc. Then they plotted the associated frequency-rank distribution plots and applied a linear fit to the resulting trend. Finally, they used the slope of the linear fit and the associated Pearson correlation coefficient (R) as metrics for characterizing each genre. For instance, Fig. 4 illustrates the resulting plots and metrics for Chopin’s Revolutionary Etude, Op. 10 No. 12 in C minor (left) and Bach’s Orchestral Suite No. 3 in D, movement no. 2, BWV 1068 (right).


Fig. 4. – Excerpt of Manaris’ results [4]

Remarkably, all the extracted metrics followed Zipfian distributions, with slopes values around the ideal -1 that is characteristic of an exact Zipf behavior (1/f). These results are reported in Fig. 5.


Fig. 5. – Excerpt of Manaris’ results [4]

On a second experiment, the authors employed these metrics for authorship identification. Using several corpora of hundreds of musical pieces from Purcell, Bach, Chopin, Debussy and Scarlatti, they extracted between 30 and 81 metrics per piece, and then used a portion of the resulting feature vectors to train an artificial neural network (ANN), using the Stuttgart Neural Network Simulator (several ANN architectures were considered). The rest of the vectors were used to test the ability of the ANN to recognize authorship. Fig. 6 summarizes the results for this experiment. It can be observed that very high success rates in excess of 90% were achieved.


Fig. 6. – Excerpt of Manaris’ results [4]

Kranenburg et al. (2005)

The problem of musical stylometry for authorship attribution was approached by this research group using non-Zipfian style markers in [5]. In this work the authors comment in detail on the problem of the selection of the metrics used for the classification, and argue that the main problem with traditional approaches is that the effectiveness of a given feature for classification can only be assessed a-posteriori, once the experiment has been run, and before this stage very little can be said about which metrics will turn out to be useful style discriminators. As a result, usually a very large set of potentially interesting metrics are extracted, even if only few of them will actually prove to be useful for discrimination (actually, some can even be found to hinder it).

The authors then report their results on a classification study on five 18th-century composers, for which they initially consider 20 style markers corresponding to what they term low-level metrics of counterpoint, including: time-slice stability, dissonant sonorities percentage, entropies of   sonority, pitch and harmony, and voice density. They disregard the more usual high-level metrics (like key, presence of known musical motifs, etc.) in favor the low-level ones by arguing that the former are more likely to reflect the characteristics of the individual compositions rather than the underlying style of the composer. Then they performed a feature selection step in order to determine the best metrics for class discrimination, using the Floating Forward-Selection algorithm, and finally they apply the so-called C4.5 algorithm to build a decision-tree from which the actual classification decisions were made. These experiments were implemented using an open Pattern Recognition Toolbox for Matlab.

In a first experiment, the authors extracted the aforementioned metrics for a corpus of 300 pieces from the online MuseData repository, including compositions from J.S. Bach, Handel, Telemann, Mozart and Haydn. The objective of the experiment was to study the differences between Bach’s style against the other composers. Fig. 7 shows the results of the feature selection step for this experiment, where the trends of the best style discriminators between the classes {Bach} (solid line) and {not Bach} (dotted lines) are shown.


Fig. 7. – Excerpt of Kranenburg’s results [5]

Using the best 3 style markers in Fig. 7, the authors created the scatterplot reported in Fig. 8, where the region characterizing Bach’s style corresponds to the highlighted lower left rectangle. Finally they applied a k-nearest neighbor classifier considering these three features for style recognition over the initial dataset, obtaining a leave-one-out error rate of 7%.


Fig. 8. – Excerpt of Kranenburg’s results [5]

On a second experiment, these researchers used combinations of their low-level metrics to address the problem of authorship recognition, in particular to the case of the F-minor fugue for organ BWV 534, where the authorship of J.S. Bach is disputed. They extracted features from compositions of J.S Bach and of two other potential authors, namely: J.L Krebs (a former pupil) and W.F. Bach (his son). They then used the Fisher linear discriminant method to transform the feature space in order to obtain the best two discriminants for the classification problem, obtaining the scatterplot reported in Fig. 9, where the data corresponding to the disputed fugue is also overlaid. From this graph a high degree of matching with J. L. Kreb’s data can be observed, and thus the authors conclude that this composer should be considered in all probability as the genuine author of the piece.


Fig. 9. – Excerpt of Kranenburg’s results [5]

Dor et al. (2011)

A further improvement over the previous approaches was proposed in [6]. In this work, the authors elaborate on the idea of using low-level features as the main style discriminators. They also introduce a machine-learning tool for the automatic discovery of new features, and use a combination of both manually and automatically-generated metrics for their classification experiments. This work is also important because for the first time the singular contribution of the different features to the overall classification accuracy was studied and quantified.

By using a corpus of 1,183 scores of nine composers from the Humdrum’s project database, the authors conducted several classification experiments. To this purpose, they formed multiple datasets including pieces of the various authors, as reported in Fig. 10.


Fig. 10. – Datasets employed by Dor et al. [6]

On a first experiment, they employed 7 different classifiers from the Waikato Environment for Knowledge Analysis suite and evaluated their performance over the aforementioned datasets, obtaining the results show in Fig. 11. It was observed that the best classifiers were the “Simple Logistic” and the “Random forest”.


Fig. 11. – Excerpt of Dor’s results [6]

The most interesting result, however, is how the different features contributed to the classification accuracy. The authors report the evolution of the accuracy with respect to the feature inclusion, obtaining the graphs in Fig. 12. From these results they observe that no general a-priori considerations can be made on the relevance of the features, since some the features with important contributions in some datasets can have irrelevant or even detrimental contributions in others.


Fig. 12. – Excerpt of Dor’s results [6]

A second experiment conducted by these authors considered the two-composer classification problem. They used a “Sequential Minimal Optimization” (SMO) over all the datasets of composer pairs, obtaining the results shown in Fig. 13. It can be seen that the accuracies reported by these researchers are well above the previously reported works.


Fig. 13. – Excerpt of Dor’s results [6]

Finally, the individual feature contributions for the two-composer classifications performed by the authors are reported in Fig. 14. Once again, it is observed that the contribution of the single features varies considerably across the different experiments, and that in some cases their contribution can be even detrimental to the classification process.


Fig. 14. – Excerpt of Dor’s results [6]


The field of musical stylometry is a burgeoning area for the application of statistical classification methods. In this post some of the most relevant approaches for tackling the problems of musical style characterization and authorship attribution have been reviewed. It was observed that all of the most recent proposals are based in data mining techniques, where machine learning algorithms are used for the classification of unknown datasets after a training process is carried out using known, labeled information. The reviewed approaches report very good performances for the differential characterization of style among pieces of well-known classical composers. Thus their applicability for the project at hand, the stylometric characterization of Jean-Jacques Rousseau’s music, seems promising.


(also available at https://www.zotero.org/spectrallypure/items)

[1] G. Zipf, “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology”, Addison-Wesley Press, 1949.

[2] W. Fucks, “Mathematical Analysis of Formal Structure of Music,” IRE Transactions on Information Theory, vol. 8, no. 5, pp. 225 –228, Sept.  1962.

[3] C. Weihs, U. Ligges, F. Morchen, and D. Mullensiefen, “Classification in Music Research,” Advances in Data Analysis and Classification, vol. 1, pp. 255–291, 2007.

[4] B. Manaris, J. Romero, P. Machado,  D. Krehbiel, T. Hirzel et  al., “Zipf ’s Law, Music Classification, and Aesthetics,”  Computer Music Journal, vol. 29, no. 1, pp. 55–69, Feb. 2005.

[5] E. Backer and P. v. Kranenburg, “On Musical Stylometry-A Pattern Recognition Approach,”, Pattern Recognition Letters, vol. 26, no. 3, pp. 299–309, Feb. 2005.

[6] O. Dor and Y. Reich, “An Evaluation of Musical Score Characteristics for Automatic Classification of Composers,” Computer Music Journal, vol. 35, no. 3, pp. 86–97, Sept. 2011.