Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Single-Channel Multi-Speaker Separation using Deep Clustering

About

Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximize signal fidelity. We evaluate the results using automatic speech recognition. The new signal approximation objective, combined with end-to-end training, produces unprecedented performance, reducing the word error rate (WER) from 89.1% down to 30.8%. This represents a major advancement towards solving the cocktail party problem.

Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey• 2016

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix (test)--
141
Speech SeparationWSJ0-2Mix
SI-SNRi (dB)10.8
65
Speech SeparationWSJ0-3mix (test)
SI-SNRi7.1
29
Voice SeparationWSJ0 3mix
SI-SNRi7.1
14
Speaker SeparationWSJ0-2mix 8kHz (test)--
14
Speech SeparationWSJ0-3mix (clean)
Delta SI-SNR (dB)7.1
12
Multi-speaker speech recognitionWSJ0-2mix 16 kHz (test)
WER30.8
8
Speaker SeparationWSJ0-3mix 8kHz (test)
Delta SI-SDR7.1
7
Showing 8 of 8 rows

Other info

Follow for update