LibriMix: An Open-Source Dataset for Generalizable Speech Separation

About

In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean and noisy conditions. Aiming towards evaluation in more realistic, conversation-like scenarios, we also release a sparsely overlapping version of LibriMix's test set.

Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent• 2020

Related benchmarks

Task	Dataset	Result
Audio-Visual Target Speaker Extraction	LRS2 2-mix (test)	DNSMOS3.16	22
Speech Separation	WHAMR!	SI-SNRi8.3	20
Speech Separation	WHAM!	SI-SNRi (dB)12.7	15
Speaker Separation	WSJ0-2mix 8kHz (test)	ΔSDR15.6	14
Speaker Separation	WSJ0-3mix 8kHz (test)	Delta SI-SDR12.7	7
Speech Separation	Libri3mix noisy 8 kHz (test)	Delta SI-SDR13.3	5
Speech Separation	Libri2mix clean 8 kHz (test)	Delta SI-SDR14.7	5
Speech Separation	Libri2mix noisy 8 kHz (test)	ΔSI-SDR12.6	5
Speech Separation	Libri3mix clean 8 kHz (test)	Delta SI-SDR13.9	5
Source Separation	FECGSYNDB	Delta SI-SDR11.4	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord