TF-CorrNet: Leveraging Spatial Correlation for Continuous Speech Separation

About

In general, multi-channel source separation has utilized inter-microphone phase differences (IPDs) concatenated with magnitude information in time-frequency domain, or real and imaginary components stacked along the channel axis. However, the spatial information of a sound source is fundamentally contained in the differences between microphones, specifically in the correlation between them, while the power of each microphone also provides valuable information about the source spectrum, which is why the magnitude is also included. Therefore, we propose a network that directly leverages a correlation input with phase transform (PHAT)-beta to estimate the separation filter. In addition, the proposed TF-CorrNet processes the features alternately across time and frequency axes as a dual-path strategy in terms of spatial information. Furthermore, we add a spectral module to model source-related direct time-frequency patterns for improved speech separation. Experimental results demonstrate that the proposed TF-CorrNet effectively separates the speech sounds, showing high performance with a low computational cost in the LibriCSS dataset.

Ui-Hyeop Shin, Bon Hyeok Ku, Hyung-Min Park• 2025

Related benchmarks

Task	Dataset	Result
Speech Separation	LibriCSS Utterance-wise v1 (test)	Score (0 Source Overlap)5.8	21
Speech Separation	LibriCSS Continuous v1 (test)	Score (10%)8.3	20
Speech Separation	LibriCSS simulated (test)	SDRi11.75	3

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord