_Making Sound playable

One could summarize the paper by Christopher Ariza with ‘using a controller as interface for live music performances’ – how it works, what benefits and limitations there are. A controller, in the paper often referred as ‘Dual-Analog Gamepad’ is originally designed as a gaming peripheral/interface, for consoles and computers. But some folks figured it out back then, that all its inputs also could be interpreted by a computer as MIDI signals and subsequently used to map certain sounds or modifiers to these inputs – thus generating music by making inputs to the buttons on the controller. This not even limited to one instrument or soundscape alone, because there are various buttons left on the device, some could be used to alternate between different instruments which have either different constraints on using or are just controlled completely different from others. Since there is also the possibility to create complex interaction patterns, like to simultaneous button presses, the amount of immediately available instruments vastly increases.

This whole approach isn’t not the newest invention, the concept of repurposing any digital interfaces to transfer their various interactions into inputs to a machine to generate anything, yet it’s not being seen everywhere though. In most live musical performances, theses ‘input methods’ are very rare – although they could greatly enhance the audience’s perception of the artist; meaning that they don’t only interact with their laptop and just use ‘conventional’ input methods, like mouse and keyboard. As the paper correctly stated, it would create the impression that the artist actually ‘plays an instrument’ and have profiled in its use.

Coming back to the paper, it mostly focuses on explaining existing interface mappings for controllers, but the goal of this paper is mainly to promote the use and experimentation of literally ‘playing’ with a controller to create new experiences in music making.

What me struck me as most interesting, since the paper is now roughly ten years old, there have been numerous improvements and advances made in controller technology. So, if someone now would harness the various sensors, input and feedback methods of a newest generation controller – like the PlayStation5 DualSense Controller – the possibilities would be mind boggling.

To reiterate, what this little piece of plastic and electronics can do:

  • 16 discrete buttons
  • 2 Thumb sticks (essentially Joysticks) which also can be pressed
  • Adaptive triggers for haptic feedback (creating various resistance experiences when pressing the triggers), which also can differentiate various strengths of button presses
  • (Also, pressable, like a button) Touchpad which can track up to 2 fingers very precise and differentiate between certain various button press locations, like left and right
  • Vibration motors for haptic feedback (precision rumble sensations)
  • Acceleration sensor
  • Gyro sensor
  • LED light panel capable of displaying a lot of covers
  • Built in Speaker
  • Built in Microphone
  • Headphone Jack
  • Bluetooth Connectivity (to Apple products it is even optimized out of the box)

So, it’s quite a list of things of what a new generation controller can do. For example, I thought of changing the different instruments by dividing the Touchpad in segments and touching different segments of said Touchpad could correspond to activating different instruments. Adding to that, the current state of the instrument selection could be represented through a corresponding colour trough the LED panel – adding insult to injury, the successful switch to another instrument could be communicated trough a short rumbling of the controller, like a little shockwave; to give more haptic feedback to the change in instruments. Also, since the Touchpad can detect touch/swipe inputs, an interaction like scratching done by DJ’s could be emulated. There is one example, where a game uses the Touchpad to detect inputs for a guitar playing minigame – in TLOU Part II. You choose a chord (from a radial menu of presets) via the Thumb stick, and then strike individual strings or all of them via the Touchpad to get a sound.

Staying on the topic of the LED panel, communicating different events or states with light and even sound directly could be used to tell the rhythm, or the Haptic feedback with Vibration or adaptive triggers could be used to indicate rhythm and enable precision timing. Coming back to the various ways of haptic feedback, with the precision vibrations or rumblings, either the current beat timing could be felt like a little bass drum – or even wilder, whatever sound has been currently created with the controller, the beats vibration pattern could be used to make the newly made music ‘tactile’ and add an interesting layer of immersion/feedback experience.

To address the other options of input methods which take advantage of the different sensors, like the Gyro sensor to map movements to music, similar to the theremin or the Accel sensor to map events, like a change in tempo, drop, etc. The option to use the speakers as output in extreme situations could also be very helpful – but maybe just for something small like a metronome – but the headphone capability of the controller could come in handy at every opportunity.

All in all, utilizing a modern age controller like the DualSense controller could really open up new and various other ways to make and literally ‘play’ music.

_Literature & Resources

Music Information Retrieval 2 – Algorithms

As we now have some questions about what we want to find in the signal we can look for algorithms that can provide information on that. here the questions are listed again:

  • Is the sound noiselike or tonelike?
  • Is the sound bright or dark in its sonic character? 
  • What is the rate of change?

For the last question, we did not find an answer yet but we found an algorithm that would interest me personally to experiment with. the MFCC.

Spectral Flatness 

Or tonal coefficient is also known as Wiener entropy is a spectral measure that constitutes how ton-like or noise-like a sound is. By analyzing the ramps in the spectrum and determining their steepness it gives out a number between 0 and minus infinity where 0 is a few sine waves and -inf. pure noise. It can also be applied on subbands rather than across the whole band.

With the output of one number, the application of this could be quite straightforward. The distinction between the tonal and non-tonal content of a musician’s tonal repertoire gives great insight into the performative intent of that musician.

Spectral Entropy:

Spectral Entropy, with a choice of a number of sub-bands. If one band, a measure of general peakiness of the spectral distribution.

Spectral Pcilentile:

This calculates the distribution of the spectral energy in a frequency spectrum and outputs the frequency value which corresponds to the desired percentile. This means it puts out the frequency where the spectral roll-off is happening, which gives information of the cutoff frequency of a filter.

Spectral Centroid

This measures the spectral centroid, which is the weighted mean frequency, or the “center of mass” of the spectrum. This means it can determine if the measured signal leans more on the bright or dull side.

Mel Frequency Cepstral Coefficients

Are „a small set of features of a signal (usually about 10-20) which concisely describe the overall shape of a spectral envelope. In MIR, it is often used to describe timbre.“ (https://musicinformationretrieval.com/mfcc.html) Because of the multitude of values, it is problematic to implement it as modulation source in a eurorack environment as it is. But with more understanding of the output, a conclusion might be drawn to either one or multiple control voltages drawn from it.

Music Information Retrieval 1 – What?¿?

One property that puts our planned module apart from modules on the market which as we will get pitch-, gate-, and envelope-information from an input signal, is the usage of Music Information Retrieval (MIR). This relatively young and growing but still young field of research seeks to make music machine-readable with techniques of machine-learning. In todays’ music distribution which is by a big part catered to via streaming services, quick implementation and organization are crucial to monetize media collections and keep up with the market. This rather economic approach to music is merely one benefit to the capabilities of MIR. Things like source separation to create stems, for instance, transcription for notation programs, pitch tracking, tempo estimation and beat tracking for converting audio to MIDI for instance or have the chords of a song detected while playing it or Autotune, or key detection to quickly program quantizers in electronic music devices, can be useful tools in music education and music production and show a useful way to use MIR in an artistic sense.

There are more than methods to retrieve musical information. Some work with Data Source which derives its data mostly from digital audio formats such as .wav, .mp3, .ogg. Though many of those formats are lossy and machine listening is more deceptible to artifacts than the human ear much research in the field involves these in their data. Additionally, more and more metadata is mined from the web and incorporated into MIR for a better understanding of music in its cultural context.

Statistics and Machine learning play also an important role in this field of research. Many of the methods are comparing music to databases and come through that to information about music in question.

For the performance character of our module information retrieval has to come almost immediately from the signal put into the module without taking the computational time of searching databases. Feature representation must be the method in question to gain information quickly through an FFT for instance. Analysis of the music is achieved by summarising which is done by feature extraction. This summary has to give a feature representation that is reduced enough to reach a manageable set of values within a reasonable time frame.

As we ponder over the possibilities of MIR we should ask ourselves what could we retrieve from the signal to gain some knowledge over the expression of the musician playing into the synth. I did a short brainstorming with Prof. Ciciliani and we came up with a few parameters which we decided to make sense in a live performance.

Is the sound noiselike or tonelike?

This would give information about the sound coming from the instrument and if there would be a pitch to extract.

Is the sound bright or dark in its sonic character?

Information about the playing technique and depending on the instrument a form of expression as many instruments emit additional harmonics in the upper registers when played more vigorously.

What is the rate of change?

This can be interpreted in more ways. Over a longer period to get additional modulation after a phrase to create some kind of call and response or a performance reverb if we want to think out of the box. Or in addition to the envelope follower compare the Atack ramps of the signal to create a kind of punch trigger when the playing gets more intense.

Hardware 2 – Pepper and Bela IDE

The Bela Starter Kit comes with a Beaglebone and the extension of the Bela Cape which houses a myriad of IOs. This kit will connect to Bela Pepper which is a PCB with a matching Faceplate for integrating the Beaglebone into a modular system. The assembly of the PCB is described on Bela.io with an illustrated manual and a bill of materials to get for building the DIY kit. This will be my task on my days off in February.

Beaglebone + Belacape ©bela.io

Pepper will be an 18 HP Module that provides Stereo IO, 8CV IO, 8 CV offset potentiometers, 4 buttons, and 10 LEDs for the Beaglebone to connect to my modular. There is also a Faceplate for a USB breakout included.

Bela Pepper assembled ©bela.io

To implement my code into the Beaglebone, on the different Belaboards is a Browser-based Integrated Development Environment (IDE). An IDE is a set of tools for a programmer to develop, test, and debug software. In the Bela IDE, one can program in C++, Pure Data, Supercollider, or Csound.  It contains example code to work with and learn basic skills to use the Bela hardware. There is sample code in every language the Beaglebone can work with. Additionally, there is also a Pin Diagram which identifies all the pins that can be found on the respective board that one uses. In my case as said before it will be the Beaglebone. Further, there is a library of pre-coded functions in there which can be used.

Bela IDE

Expression 2 – Quantising

So there is a multitude of values to be extracted to pick up a musician’s expression in performance. If the music is written down, some of it is readable by the sheet music. Some of it however is an individual expression of the musician. which is far more abstract in character and much more difficult to pick up because it is not possible to predefine it or calculate it. So we have to quantize expression somehow directly from the performance. Clemens Wöllner suggests in his opinion article to quantify artistic expression with averaging procedures.

A big point of the expression is to raise the attractiveness of the musical piece one is playing to a point to make it one’s own in the sense of the performance. Individuality is highly valued in the expression of a performer. Cognitive psychology studies teach us that average modalities in visual and auditory modalities are viewed as more attractive. Averaging procedures typically produce very smooth displays in pictures and sound. Listeners of performance typically expect more from a concert or a recording than an even performance. As said individuality is highly appreciated in music.

In classical genres, expression is often added by subtle timing perturbations and fluctuations in dynamic intensity, as unexpected delays or changes in intensity that are different from the typical expectations of the listener can cause surprise and other emotional reactions and thus help the individual performer’s musical expression. In earlier decades of the 20th century, for instance, musicians typically employed large rubati which are deviations in note length, most of the melody voice. It is not as common anymore, the changes of note length are far smaller today. Research along these lines has for a long time studied expressive timing deviations from a non-expressive metronomic version. These timing deviations constitute an individual expressive microstructure. As performers are not able to render a perfect mechanical, metronomically exact performance. To quantify those timing variations using a so-called deadpan rendition as average, can not be a valid indicator of individuality.

So musical performances can be averaged according to the main quantifiable dimensions of duration, dynamic intensity, and pitch. As for the average performance, it was suggested in seminal studies 1997 by Repp that the attractiveness is raised by not deviating from the average, expected performance, but it is also considered a dull performance if there is no individuality in it by straying from the average. 

Averaged deviations from the notated pitch in equidistant temperament could be analyzed. The sharpening or flattening of tones may reveal certain expressive intentions of individual performers. Also, musicians are able to shape the timbre of certain instruments to some extent which adds to their expression.

(see.: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3685802/#!po=71.7391 30.12.2021, 20:12.)

Hardware 1 – DSP Boards

What hardware microcontrollers and DSP chips are readily available to power the Interface module? That is a central question to start working on ways to implement MIR algorithms into a module. The second question is what code language is compatible with the chips and how can one implement it.

Those questions are examined in a paper by the International Conference on New Interface for Musical Expression (short NIME) named: „A streamlined work ow from Max/gen~ to modular hardware“ by Graham Wakefield, 2021 which focuses on the oopsy workflow which streamlines digital sound processing algorithms to work with the modular synthesizer environment.

As microcontrollers such as Arduino and Teensy get more powerful by the day they are more and more useful for musicians and luthiers to use in music and musical instruments. The play to make electronic music live and without a laptop that would run a DAW is a strong motivation for musicians to get into coding and learn to develop equipment which is providing often the few tools a DAW is offering them for live performances.

For DSP chips to read code programmed in a visual language like Pure Data or Max MSP the patch most of the time has to be compiled into C++.  Within Max, there is for instance the [gen~] object which is capable of doing so. To implement the mach well into the hardware ‚oopsy‘ was developed which streamlined the workflow, to get an algorithm onto hardware, with a targeted firmware generation that is optimized for CPU usage and low memory footprint and program size, with minimal input required.

Electrosmith Daisy:

Processor: ARM Cortex-M7 STM32H750 MCU processor with 64MB of SDRAM and 8MB of

flash memory, IO: Stern, 31 configurable GPIO pins, 12x 16-bit ADCs,  2×12 bit DACs, SD Card interface, PWM outputs, micro USB port (power and data), Dasy Seed: 51×18 mm

Dasy Seed © electro-smith.com

It is a common microcontroller in Modular Synth gear today. The MCU processor is with its maximal 480MHz quite capable and the AK4556 Codec has AC-coupled converters that internally run with 32-bit floating-point. Daisy firmware can be developed using Arduino, FAUST, PureData via Heavy, as well as Max/gen~ using the Oopsy software. internal latency down to 10 microseconds.

Bela Beaglebone:

Bela is an open-source platform based on the beaglebone single-board computer design for live audio. It is compatible with Supercollider, PureData, and C++. It is optimized for ultra-low latency, with 0,5 ms it is better for desktop, cellphone, Arduino, and Raspberry Pi solutions.

Bela Staterkit © Bela.io

Owl Programable platform

8kHz to 96kHz sampling rate, 24 bit stereo codec, 3500 operations per sample @ 48kHz, Powerful STM32F4 microcontroller: 168MHz 32bit ARM Cortex M4, 192Kb RAM, 1Mb Flash memory, Integrated DSP, FPU, DMA, 1Mb 10nS SRAM, USB MIDI

Rebel Technology, OWL Digital mk2 Rev 7 © https://shop.befaco.org/misc/1091-rebel-technology-owl-digital-platform.html

IO Eurorack module: 2 audio inputs, 2 audio outputs, 5 CV inputs, 1 gate/trigger in, 1 gate/trigger out, 1 USB Type B connector


Graham Wakefield. 2021. A streamlined workflow from Max/gen~ to modular hardware. Proceedings of the International Conference on New Interfaces for Musical Expression. http://doi.org/10.21428/92fbeb44.e32fde90.




Expression 1 – Defenition

Illustration of an ADSR envelope. ©Christoph Bus

For the Musician–Synthesizer Interface it is important to translate the pitch, Amplitude-envelope and note length. But those are the Basic values that define the most basic values of translating Music into the air. The pitch and the relative length are defined for example by sheet music and the envelope by the Characteristics of the instrument played. The most common  envelope found in synthesizers ist the ADSR shape standing for ‚attack‘, duration of the rising ramp of the signal, ‚decay‘, duration of falling ramp of the signal starting after the attack ramp is at its peak value, ‚sustain‘, value of the held signal as long as the gate is open and ‚release‘ duration of the signal falling from the last value to zero after the gate is closed. This is also one of the simplest ways to portray many acoustical instruments in their Amplitude envelopes.

Illustrations of different Envelope shapes depicting acoustic instruments. ©Christoph Bus

But the timbral structure of sounds are mostly not only described by their amplitude envelopes. Many musical instruments are defined by variations in pitch, and the color of the sound. So the simple amplitude picked-up by an envelope follower is a very basic tool to define the sound of a musician. Furthermore it only draws conclusions of the basic values a musician puts into his instrument. So to capture a musician more fully her expression plays a big role in the interpretation of control voltages.

So how can we define musical expression? As said before in most notation of western music pitch and relative length are written down, things like tempo and dynamics or direction for technique are written down in words or abbreviations. But the finer points of a performance which are mostly inherent to every musicians individuality are much nowhere to be found except the playing of the musician. So the common expression for tempo in italian are widely known as follows roughly from slow to fast: adagissimo, adagio, lento, andante, andantino, allegretto, allegro, presto, prestissimo. (Britanica)

As for dynamics roughly from quiet to loud: piano, pianissimo, mezzo piano, forte, mezzo forte, fortissimo and some changes in dynamics: fortepiano (loud then soft) sforzando (sudden accent) crescendo (gradually louder), diminuendo (gradually softer). 

So those are nowadays all the definitions which a composer uses to translate his musical thoughts to the performer. but it wasn’t always like this.

„…[I]n much 17th- and 18th-century music, the composer notated only the main structural notes of the solo part, leaving the performer to improvise ornamental figuration.“


Those figurations or ornamentations gave the musician the freedom to express themselves and influence the tradition of the then current music.

Excerp from a Sonate by Arcangelo Correlli Da Fusignano Opera Quinta

Here you can see the bottom two lines are the composers’ structure of the piece and the top line are the individual ornaments an artist put over the piece.

Reference 2 – Expression Hardware


In modern Midi keyboards, there are several possibilities to record expressions. The widest spread feature is the velocity control. This parameter is controlled by the velocity one hits the keys and thus can be easily added by keyboard performers in their playing like they would playing an acoustic instrument. With the synthesizer and the possibility to create the sound of the instrument individual to the performance also came the possibility to control parameters of a sound which in acoustic or electro-acoustic keyboard instruments with keys and pedals only really possible. The Pitch and Mod wheel were introduced to make such changes possible. The first was a spring-actuated week or stick which was mostly used to modulate the pitch like with a guitar. The other was an adjustable heel with which one could send fixed values or modulate them manually. The fourth modulation source developed for keyboard synthesizers is aftertouch. As the name suggests, it is applied by altering the pressure after the key is depressed. This can be applied mono- or polyphonically.  All of those controls added to the expressivity of synthesizer performances mostly. Only one of those controls is determined before the tone or as the tone is generated. The others are applied in the decay of the sound. So those are 4 control values that have been proven to add expressivity in performance. 

Ofcourse, these weren’t the only tools that were developed to do very expressive performances, although they are the most common ones. There is a multitude of midi controllers to add expression to an electronic music performance. The expressive E ‘Touché’ or ‘Osmose’, Buchla and Serge capacitive keyboards and joystick-controllers on synths like the EMS Synthy, Korg devices like the Sigma or the Delta and as controller module for Eurorack-, 5U-, Buchla- and Serge-modules. 

Other Concepts

Then there are control surfaces that take another approach to the whole concept of the Keyboard entirely. These Concepts go often but not always hand in hand with a synthesizer engine.

HAKEN Continuum

The Haken Continuum for instance is a Synthesizer with a control surface that can detect movement in 3 axes.

The Haken Continuum Fingerboard is an instrument born to be as expressive and as rewarding to play as an acoustic instrument. The uniquely designed sensitive playing surface has been symbiotically merged with its powerful synthesis sound engine to produce a truly unique playing experience. The  Continuum is a holistic electronic instrument that puts its player at the heart of a uniquely fluent, gestural and intuitive musical playing experience.


Roli Seaboard

The Roli SEA Technology which is implemented in rolis seaboard controllers is as roli puts it:

“Sensory, Elastic and Adaptive. Highly precise, information-rich, and pressure-sensitive. It enables seamless transitions between discrete and continuous input, and captures three-dimensional gestures while simultaneously providing the user with tactile feedback.”


Roli Seaboard Rise 49


The Linnstrument is a control surface developed by famous instrument designer Roger Linn. Interesting here is the approach to not apply a piano-style keyboard but rather use a grid-style keyboard which rather reminds of the tonal layout of string and guitar instruments. With the linnstrument there is also a release velocity recorded which places it even more into guitar territories where pull-offs, when one rapidly pulls of the finger of a string to excite it and thus making it sound, is a standard technique.


So few of the looked at control surfaces if any have more than 4 modulatable values. This would be then a minimum for a module that should be able to translate the expression of an instrumentalist into control voltages.

Reference 1 – Sarah Belle Reid

Copyright © Sarah Belle Reid, twitter

Sarah Belle Reid is a Trumpet player and Synthesist who takes the sound of her brass instruments and puts them through her modules Systems like Buchla, Serge, or Eurorack. She has developed a device with which she translates her trumpet playing to CV and/or MIDI messages called MIGSI.

They Developed MIGSI in a big part to enable her to use all of the techniques Sarah Belle Reid has developed on her Instrument to translate into more than ‘just’ her instrument and open the horizon of the instrument the electronic music-making possibilities.


MIGSI: Minimal invasive gesture sensing interface. She calls it ‘electronically augmented trumpet’ too. The device was co-developed by her and Ryan Gaston around 2014. They also founded ‘Gradient’ a joint venture between them where they develop “handmade sound objects that combine elements of the natural world with electronic augmentation.” (vgl.: Gradientinstruments.com).

Migsi is a Sensor-based Interface with 3 types of sensors and 8 streams of Data. Pressure sensors around the valves which read the pressure of the grip force, an accelerometer that senses movement of the Trumpet, and optical sensors which reads the movement of the Valves.


The hardware is then read by a MIGSI app which is a MAX map Patch. The app is used to process thee the audio signal of the trumpet, modulate external equipment with the sensor input or modulate a synth engine inside the MIGSI App.

(vgl.: https://www.youtube.com/watch?v=tbXgUDQaNv0)