Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Wavesplit: End-to-End Speech Separation by Speaker Clustering

About

We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation. For speech separation, our sequence-wide speaker representations provide a more robust separation of long, challenging recordings compared to prior work. Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2/3mix), as well as in noisy and reverberated settings (WHAM/WHAMR). We also set a new benchmark on the recent LibriMix dataset. Finally, we show that Wavesplit is also applicable to other domains, by separating fetal and maternal heart rates from a single abdominal electrocardiogram.

Neil Zeghidour, David Grangier• 2020

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix (test)
SDRi (dB)22.3
141
Speech SeparationWSJ0-2Mix
SI-SNRi (dB)22.2
65
Speech SeparationWHAM! (test)
SI-SNRi (dB)16
58
Speech SeparationWHAMR! (test)
ΔSI-SNR13.2
57
Speech SeparationLibri2Mix (test)
SI-SNRi (dB)19.5
45
Speech SeparationWSJ0-3mix (test)
SI-SNRi17.8
29
Speech SeparationWHAMR!
SI-SNRi13.2
20
Source SeparationWSJ0-2Mix (test)
SI-SNRi22.2
17
Speech SeparationWHAM!
SI-SNRi (dB)16
15
Speaker SeparationWSJ0-2mix 8kHz (test)
ΔSDR22.3
14
Showing 10 of 26 rows

Other info

Follow for update