Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation

About

We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.

Zhong-Qiu Wang, Peidong Wang, DeLiang Wang• 2020

Related benchmarks

TaskDatasetResultRank
Speech SeparationLibriCSS Utterance-wise v1 (test)
Score (0 Source Overlap)9.2
21
Speech SeparationLibriCSS Continuous v1 (test)
Score (10%)13.2
20
Speech SeparationWHAMR! 1CH
SI-SNRi (dB)10.3
11
Showing 3 of 3 rows

Other info

Follow for update