COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

About

We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS). COCOLA allows the objective evaluation of generative models for music accompaniment generation, which are difficult to benchmark with established metrics. In this regard, we evaluate recent music accompaniment generation models, demonstrating the effectiveness of the proposed method. We release the model checkpoints trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).

Ruben Ciranni, Giorgio Mariani, Michele Mancusi, Emilian Postolache, Giorgio Fabbro, Emanuele Rodol\`a, Luca Cosmo• 2024

Related benchmarks

Task	Dataset	Result	Rank
Subjective Human Correlation for Musical Audio Coherence	MUSDB18 HQ (test)	Pearson ρ0.181		9

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord