Improving Music Source Separation with Diffusion and Consistency Refinement

About

In this work, we propose an approach to music source separation that uses a generative diffusion model as a last-stage refinement on top of a deterministic separator, progressively enhancing the separated sources through iterative denoising. While the diffusion refinement yields measurable quality gains, it requires iterative steps at inference, increasing computational cost. To speed up the inference process, we apply consistency distillation, reducing inference to a single step while maintaining quality; with two or more steps, the distilled model even surpasses the diffusion-based approach. Crucially, our method is architecture-agnostic: we demonstrate state-of-the-art results when applied to both a custom U-Net-based separator on Slakh2100 and the state-of-the-art BS-RoFormer model on MUSDB18, showing that the refinement generalizes across backbone architectures. Sound examples are available at: https://consistency-separation.github.io/.

Tornike Karchkhadze, Mohammad Rasool Izadi, Shuo Zhang, Shlomo Dubnov• 2024

Related benchmarks

Task	Dataset	Result	Rank
Audio Source Separation	Slakh2100 (test)	SDR Bass16.13		16
Music Source Separation	MUSDB18	SDR (Bass)10.24		11

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord