Towards New Visual Representations

Musicological research on traditional Georgian vocal music in the past has usually been conducted on the basis of notated musical scores, which were obtained by manually transcribing audio material from field recordings. Such approaches can be considered problematic since important tonal cues and performance aspects are likely to get lost in the transcription process.  If  larynx microphone recordings exist for each singer or at for each voice group, several alternative options for visual representation of recorded music  are available as illustrated in detail in Scherbaum (2016). 

Static representation

Melodic and harmonic content of the beginning of the song Elia Lrde.

For three-voiced Georgian vocal music it is particularly interesting that, based on pitch trajectories determined from larynx microphone recordings,  one can are visualise the complete melodic and harmonic content a song -  including microtonal details - in a single plot. This is illustrated in the figure on the left which is taken from Scherbaum (2016). The black, red and blue dotted lines in the figure show the pitch tracks for the bass, middle and top voice respectively. The spaces between middle and top, and bass and middle voice are colored according to the interval sizes between the voices. The space below the bass voice is shaped and color coded according to the interval bass-top voice.  It  can be seen that the colors representing the intervals between the bass and the top voice are mostly light blue which corresponds to pitch differences of around 700 cents (a fifth), interrupted once in a while by red colors, which corresponds to 1200 cents (an octave). The color codes for the pitch differences between the bass and the middle voice indicate values of approximately 500 cents (a fourth), once in a while interrupted by a dif- ference of 700 cents (a fifth). Consequently the differences between the middle and the top voice correspond to values around 200 cents (a major second), once in a while interrupted by values of approximately 500 cents (a fourth). At times all three voices approach the same pitch value (unisone). So in a single glance, the figure reveals the harmonic character of the song Elia Lrde.

Melodic and harmonic content of the beginning of the song Elia Lrde.

Dynamic representation

In addition to the static representations illustrated in Scherbaum (2016) and discussed above, the availability of multi-media tracks makes it possible to convey both the acoustical content and the musical structure in new ways. This is illustrated below for the song Gurian song Chven Mshvidoba, which is also one of the examples in Scherbaum et al. (2018). 

Here, the  viewer listens to an audio mix from the three headset microphones while watching a vertical cursor (which represents the actual audio playback position) horizontally moving over a display window of a chosen duration (here 25 sec).  Shown in the display window are the stable pitch elements (horizontal bars) determined from the  larynx microphone tracks together with the corresponding F0-trajectories segments (wiggly lines).  The horizontal axis is time in seconds (with respect to the start of the song) while  the vertical axis is in cents (with respect to a chosen reference frequency in Hz). In the right part of the display window  the pitch distribution in this song (rotated by 90 degrees) is plotted as a reference for the tuning system used. The used tuning system is also visualized by a set of horizontal lines grid lines, corresponding to the mean values of the individual pitch distribution components.  From the spacing of the mean values of the  individual pitch distribution components  it can be seen that the tuning which is used for the song Chven Mshvidoba  is significantly different from the 12 tone-equal-temperament (12-TET) system,  where the pitch spacing would be expected to be either 100 or 200 cents. It  would therefore be inappropriate to transcribe this song in western 5-line staff notation (which is based on the 12-TET system). 

Whenever the playback cursor falls within a stable pitch segment in any voice, the corresponding pitch element is highlighted and a horizontal bar is superimposed on the  rotated pitch distribution. At the same time, the pitch of the lowest stable pitch element is shown within a green ellipse, subscripted by the interval of the second lowest stable pitch element, and superscripted by the interval of the highest stable pitch element (always with respect to the lowest one). 

One of the advantages of this type of representation is that it is automatically adapting to any tuning system used. The display of the active stable pitch elements by vertically moving bars (one for each voice) in the right part of the display window is closely related to chironomic choir singing which is  a very  intuitive teaching practice (e. g. d'Alessandro et al., 2014}. Understanding the structure of a song  from such a representation does not require the ability to read sheet music, which is yet another advantage. 


d’Alessandro, C., Feugère, L., Le Beux, S., Perrotin, O., & Rilliard, A. (2014). Drawing melodies: evaluation of chironomic singing synthesis. The Journal of the Acoustical Society of America135(6), 3601–12.

Scherbaum, F. (2016). On the benefit of larynx-microphone field recordings for the documentation and analysis of polyphonic vocal music. Proc. of the 6th International Workshop Folk Music Analysis,15 - 17 June, Dublin/Ireland, 80–87.  (PDF)

Scherbaum, F.,  S. Rosenzweig, M. Müller, D.  Vollmer, and N. Mzhavanadze (2018). Throat Microphones for Vocal Music Analysis, in Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR), 2018.  (Link)

Scherbaum, F., Mzhavanadze, N., Rosenzweig, S., & Müller, M. (2019). Multi-media recordings of traditional Georgian vocal music for computational analysis. In 9th International Workshop on Folk Music Analysis, 2-4 July, 2019 (p. submitted). Birmingham.  (PDF)