Music Information Retrieval 2 – Algorithms

As we now have some questions about what we want to find in the signal we can look for algorithms that can provide information on that. here the questions are listed again:

  • Is the sound noiselike or tonelike?
  • Is the sound bright or dark in its sonic character? 
  • What is the rate of change?

For the last question, we did not find an answer yet but we found an algorithm that would interest me personally to experiment with. the MFCC.

Spectral Flatness 

Or tonal coefficient is also known as Wiener entropy is a spectral measure that constitutes how ton-like or noise-like a sound is. By analyzing the ramps in the spectrum and determining their steepness it gives out a number between 0 and minus infinity where 0 is a few sine waves and -inf. pure noise. It can also be applied on subbands rather than across the whole band.

With the output of one number, the application of this could be quite straightforward. The distinction between the tonal and non-tonal content of a musician’s tonal repertoire gives great insight into the performative intent of that musician.

Spectral Entropy:

Spectral Entropy, with a choice of a number of sub-bands. If one band, a measure of general peakiness of the spectral distribution.

Spectral Pcilentile:

This calculates the distribution of the spectral energy in a frequency spectrum and outputs the frequency value which corresponds to the desired percentile. This means it puts out the frequency where the spectral roll-off is happening, which gives information of the cutoff frequency of a filter.

Spectral Centroid

This measures the spectral centroid, which is the weighted mean frequency, or the “center of mass” of the spectrum. This means it can determine if the measured signal leans more on the bright or dull side.

Mel Frequency Cepstral Coefficients

Are „a small set of features of a signal (usually about 10-20) which concisely describe the overall shape of a spectral envelope. In MIR, it is often used to describe timbre.“ (https://musicinformationretrieval.com/mfcc.html) Because of the multitude of values, it is problematic to implement it as modulation source in a eurorack environment as it is. But with more understanding of the output, a conclusion might be drawn to either one or multiple control voltages drawn from it.

Music Information Retrieval 1 – What?¿?

One property that puts our planned module apart from modules on the market which as we will get pitch-, gate-, and envelope-information from an input signal, is the usage of Music Information Retrieval (MIR). This relatively young and growing but still young field of research seeks to make music machine-readable with techniques of machine-learning. In todays’ music distribution which is by a big part catered to via streaming services, quick implementation and organization are crucial to monetize media collections and keep up with the market. This rather economic approach to music is merely one benefit to the capabilities of MIR. Things like source separation to create stems, for instance, transcription for notation programs, pitch tracking, tempo estimation and beat tracking for converting audio to MIDI for instance or have the chords of a song detected while playing it or Autotune, or key detection to quickly program quantizers in electronic music devices, can be useful tools in music education and music production and show a useful way to use MIR in an artistic sense.

There are more than methods to retrieve musical information. Some work with Data Source which derives its data mostly from digital audio formats such as .wav, .mp3, .ogg. Though many of those formats are lossy and machine listening is more deceptible to artifacts than the human ear much research in the field involves these in their data. Additionally, more and more metadata is mined from the web and incorporated into MIR for a better understanding of music in its cultural context.

Statistics and Machine learning play also an important role in this field of research. Many of the methods are comparing music to databases and come through that to information about music in question.

For the performance character of our module information retrieval has to come almost immediately from the signal put into the module without taking the computational time of searching databases. Feature representation must be the method in question to gain information quickly through an FFT for instance. Analysis of the music is achieved by summarising which is done by feature extraction. This summary has to give a feature representation that is reduced enough to reach a manageable set of values within a reasonable time frame.

As we ponder over the possibilities of MIR we should ask ourselves what could we retrieve from the signal to gain some knowledge over the expression of the musician playing into the synth. I did a short brainstorming with Prof. Ciciliani and we came up with a few parameters which we decided to make sense in a live performance.

Is the sound noiselike or tonelike?

This would give information about the sound coming from the instrument and if there would be a pitch to extract.

Is the sound bright or dark in its sonic character?

Information about the playing technique and depending on the instrument a form of expression as many instruments emit additional harmonics in the upper registers when played more vigorously.

What is the rate of change?

This can be interpreted in more ways. Over a longer period to get additional modulation after a phrase to create some kind of call and response or a performance reverb if we want to think out of the box. Or in addition to the envelope follower compare the Atack ramps of the signal to create a kind of punch trigger when the playing gets more intense.

Expression 2 – Quantising

So there is a multitude of values to be extracted to pick up a musician’s expression in performance. If the music is written down, some of it is readable by the sheet music. Some of it however is an individual expression of the musician. which is far more abstract in character and much more difficult to pick up because it is not possible to predefine it or calculate it. So we have to quantize expression somehow directly from the performance. Clemens Wöllner suggests in his opinion article to quantify artistic expression with averaging procedures.

A big point of the expression is to raise the attractiveness of the musical piece one is playing to a point to make it one’s own in the sense of the performance. Individuality is highly valued in the expression of a performer. Cognitive psychology studies teach us that average modalities in visual and auditory modalities are viewed as more attractive. Averaging procedures typically produce very smooth displays in pictures and sound. Listeners of performance typically expect more from a concert or a recording than an even performance. As said individuality is highly appreciated in music.

In classical genres, expression is often added by subtle timing perturbations and fluctuations in dynamic intensity, as unexpected delays or changes in intensity that are different from the typical expectations of the listener can cause surprise and other emotional reactions and thus help the individual performer’s musical expression. In earlier decades of the 20th century, for instance, musicians typically employed large rubati which are deviations in note length, most of the melody voice. It is not as common anymore, the changes of note length are far smaller today. Research along these lines has for a long time studied expressive timing deviations from a non-expressive metronomic version. These timing deviations constitute an individual expressive microstructure. As performers are not able to render a perfect mechanical, metronomically exact performance. To quantify those timing variations using a so-called deadpan rendition as average, can not be a valid indicator of individuality.

So musical performances can be averaged according to the main quantifiable dimensions of duration, dynamic intensity, and pitch. As for the average performance, it was suggested in seminal studies 1997 by Repp that the attractiveness is raised by not deviating from the average, expected performance, but it is also considered a dull performance if there is no individuality in it by straying from the average. 

Averaged deviations from the notated pitch in equidistant temperament could be analyzed. The sharpening or flattening of tones may reveal certain expressive intentions of individual performers. Also, musicians are able to shape the timbre of certain instruments to some extent which adds to their expression.

(see.: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3685802/#!po=71.7391 30.12.2021, 20:12.)