Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Continuous Speech Separation with Conformer

About

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou• 2020

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix (test)--
160
Speech SeparationWHAMR! (test)
ΔSI-SNR6.7
57
Speech SeparationLibriCSS Utterance-wise v1 (test)
Score (0 Source Overlap)12.9
21
Speech SeparationLibriCSS Continuous v1 (test)
Score (10%)16.3
20
Continuous speech separationLibriCSS 20%
WER (Hybrid)13.5
13
Continuous speech separationLibriCSS 0S
WER (Hybrid)0.11
13
Continuous speech separationLibriCSS 0L
WER (Hybrid)8.7
13
Continuous speech separationLibriCSS 10%
WER (Hybrid)12.6
13
Continuous speech separationLibriCSS 30%
WER (Hybrid)0.175
13
Continuous speech separationLibriCSS 40%
WER (Hybrid)19.6
13
Showing 10 of 14 rows

Other info

Follow for update