We recently submitted a model to the AI Music Generation Challenge 2021. The task in this competition is to build a system that generates idiomatic slängpolska – a traditional dance form from Scandinavia.
Our main focus is to develop a generative model that provides a high amount of control to the user. The user can specify the harmony, texture coherence (similarity among bars), phrase ending, cadence type, and rhythmic density, across multiple levels. This complements neural end-to-end models which dominate the music-generation field at the moment but do not provide such controllability out of the box. For future research, we envision that our work can contribute to close the gap between end-to-end models and ones that are easy to control.
The code and generated results from our model can be found here. If you have any questions about this project, don’t hesitate to send us an email at:
- Zeng Ren, [email protected]
- Christoph Finkensiep, [email protected]
- Daniel Harasim, [email protected].
Our Model
The model generates a piece in three stages.
Stage 1: Specifying form
The input of the model is a collection of form templates that specify information on the bar level. For each bar, five features are specified:
- Harmony (scale degree)
- Marker for coherence structure (which bars should be an exact copy, or similar, or different from each other)
- Flag for phrase ending that indicates whether the bar’s last note must be a guide tone
- Cadence (perfect authentic cadence or half cadence or none)
- Number of maximum elaboration (to enable the control of rhythmic density)
As a starting point for a user, we currently provide a set of 4 pre-specified form templates.
An example form template (8-bar period):
Stage 2: Generating Guide Tones
For each non-cadence bar that has a unique coherence marker, we assign a random integer k (from 0 to 6) to each beat. The guide tone for each beat is then determined as the k-th chord tone (in ascending order by pitch height) within a pre-specified piece range. For example for a C major triad in range C3-C5, chord tones 0 and 2 are C3 and G3 respectively.
To enforce the coherence specified by the form template, we then assign this integer to all bars that have the same coherence marker.
In the case of cadences, There is a fixed pool of guide tone lines. One possibility are the scale degrees 3-2, 1, 1 (one octave lower), as in the last bar of the example above.
The guide tones in the example correspond to the example form template.
Stage 3: Elaboration
At each elaboration step, the model performs at most one operation per bar.
Here is an example of how the melody is elaborated iteratively from the guide tones (first line) to the final result (last line).
The colors represent the elaboration operations, LeftNeighbor (red), RightNeighbor (orange) as well as Fill (blue) which is an umbrella operation for both arpeggiation and passing tone motion. Also possible but not used here are the operations LeftRepeat and RightRepeat.
The choice of operation and location, called Action, is determined by a hand-tuned policy called RhythmBalancedPolicy in this state of the project. In a next step, we plan to learn this policy from data. When encountering a bar whose coherence marker is present in any previous bar, another policy called ImitatingPolicy is used to determine the action on this bar. This component is essential for the imitation of previous material and thus enforces motivic and textural coherence.
Acknowledgements
This project has received funding from the European Research Council
(ERC) under the European Union’s Horizon 2020 research and innovation
program under grant agreement No 760081 – PMSB. We thank Claude Latour for supporting this research through the Latour Chair in Digital Musicology. We additionally thank the members of the Digital and Cognitive Musicology Lab (DCML) for the fruitful discussions.