Noise-robust Speech Separation with Fast Generative Correction

About

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model's reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.

Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak• 2024

Related benchmarks

Task	Dataset	Result
Multi-channel Speech Separation	WSJ0 + WHAM! (in-domain)	PESQ1.98	12
Multi-channel Speech Separation	Librispeech + DEMAND (out-of-domain)	PESQ2.13	12
Text-prompted separation	Speaker	SAJ2.66	9
Multi-channel Speech Separation	Low-resource languages + DEMAND (out-of-domain)	PESQ1.7	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord