What Some Dead White Guy Did: A Statistical Topology
Of Keyboard Usage In Beethoven's Sonatas For Pianoforte

2. Methodology

One of the major drawbacks to doing large-scale statistical analyses of music is the amount of time needed to enter data into a machine-readable format that a computer can then use for analysis. This problem has been greatly attenuated by the proliferation of works from the classical repertoire available online, encoded as both sheet music readable by such notation programs as Sibelius or Finale, and as MIDI files. Although MIDI files can be problematic when used for an analysis in which the notation of a work is the focus of investigation (MIDI does not save the notational representation into the file format), they can be of much use when utilised for tasks in which note frequency, duration, dynamics, orchestration, or other notation non-specific parameters are the key concern.

In this study, MIDI files for all thirty-two of Beethoven's pianoforte sonatas were downloaded from the website http://www.kunstderfuge.com/beethoven/sonatas.htm and subjected to analysis. Although there are many MIDI arrangements of Beethoven's sonatas in existence, these files were chosen because they appear to be translated straight from a notational program, rather than being a recording of a live performance (this can be seen in the metronomic quality of the rhythmic data which shows a strict quantization that cannot have come from playing). As this analysis has note duration as its primary focus of investigation, the use of MIDI files generated directly from the notation, and without the problematics of performance practice, allows the data to be less biased by interpretive decisions and closer to the raw data of the notation itself.

The second reason for using this corpus is that it is one of the few collections in which the all of the piano sonatas are encoded by one person, Bunji Hisamori. It is hoped that, although this means that the data represents only one person's interpretation of the works, and thus reduces the generality of the results, it also means that a comparison of keyboard usage is possible across the range of sonatas; any disparity between interpretations of tempo markings or the durations of ambiguous rhythmic devices such as acciaccaturas etc. is removed.

These MIDI files were then checked to ensure that there were not any major errors, especially relating to maximum or minimum ranges of notes. During this analysis, it was found that these MIDI files were based on a version for modern piano in which sporadic phrases had been raised or lowered by an octave, or doubled at the octave to increase the range. These phrases were then altered in the MIDI files, in accordance with the original notation of the sonatas, using the edition of the sonatas edited by Harold Craxton (Beethoven, 1958). This was done to keep the MIDI files as close as possible to the original notation, in order to preserve the imprint of the piano with which they were written on or for. A few mistakes were also removed. A full list of the corrections can be found in Appendix I.

These files were then converted into a more readable format using the tool for MIDI to text conversion found at http://flashmusicgames.com/midi/mid2txt.php, and later the Midi2Mtx converter found at http://www.midiox.com/. Absolute rather than Delta durations were used for this conversion. This creates a file of tempo-independent absolute durations in milliseconds and thus, can be easily used to calculate total keyboard usage over the entire corpus.

Some lisp code for the program emacs was then created to strip out the data from the files and compile a list of the total time each note was depressed for over the course of each sonata.

Dynamics were ignored, as it was felt that an attempt to factor these in would result in quieter movements being given less importance than louder ones. This also allowed the study to concentrate upon the topological usage of the keyboard, not the sounding result.

In short, the data extracted gave only the total amount of time which a specific key was depressed over the length of each sonata. This data was then ordered and compiled into tables.