Multitrack Music Transformer

About

Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations. In this work, we propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length. Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems, landing in between two recently proposed models in a subjective listening test, while achieving substantial speedups and memory reductions over both, making the method attractive for real time improvisation or near real time creative applications. Further, we propose a new measure for analyzing musical self-attention and show that the trained model attends more to notes that form a consonant interval with the current note and to notes that are 4N beats away from the current step.

Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick• 2022

Related benchmarks

Task	Dataset	Result
Symbolic music generation	SOD (test)	Mean NLL0.632	11
Symbolic music generation	Lakh (test)	Mean NLL0.376	11
Symbolic music generation	Pop1k7 (test)	Mean NLL1.396	11
Symbolic music generation	POP909 (test)	Mean NLL0.986	11
Round-trip duration steering	Symbolic Music 10 extreme songs (test)	W1 Duration Error (ticks)2.27	8
Symbolic Music Generation (Duration Steering)	SOD (test)	Duration (ticks)7.99	6
Symbolic Music Generation (Pitch Steering)	SOD (test)	Pitch (semitones)67.94	6
Pitch Steering	SOD Low→Up→Dn scenario	W1 Mean Pitch Error (st)49.4	4
Unconditional Symbolic Music Generation	Lakh MIDI and Meta MIDI (test)	PCE0.103	4
Pitch Steering	SOD High→Dn→Up scenario	W1 Pitch Error (st)80.93	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord