Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

About

This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture of two sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency (T-F) unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3mix corpus.

Zhong-Qiu Wang, Ke Tan, DeLiang Wang• 2018

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix (test)
SDRi (dB)15.6
141
Speech SeparationWSJ0-2Mix
SI-SNRi (dB)15.3
65
Source SeparationWSJ0-2Mix (test)
SI-SNRi15.3
17
Speaker SeparationWSJ0-2mix OC (test)
PESQ3.45
15
Showing 4 of 4 rows

Other info

Follow for update