Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

About

Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstructs speaker-discriminative features with cross-speaker interaction, enabling stage-wise refinement. To complement this architecture, we formulate speech separation as a structured correlation-to-filter problem: spatio-spectro-temporal correlations computed from the observations are used as input features, and the corresponding deep filters are estimated to recover target signals. We further incorporate an attractor-based dynamic split module to adapt the number of output streams to the actual speaker configuration. Experimental results on WSJ0-2/3/4/5Mix, WHAMR!, and LibriCSS demonstrate consistent improvements across anechoic, noisy-reverberant, and real-recorded conditions in both single- and multi-channel settings, highlighting the effectiveness of TF-domain SepRe with correlation-based filter estimation for speech separation.

Ui-Hyeop Shin, Hyung-Min Park• 2026

Related benchmarks

TaskDatasetResultRank
Speech SeparationWSJ0-2Mix anechoic clean mixture (test)
SI-SNRi25.5
23
Speech SeparationLibriCSS Utterance-wise v1 (test)
Score (0 Source Overlap)6.2
21
Speech SeparationLibriCSS Continuous v1 (test)
Score (10%)7
20
Speech SeparationWSJ0 3mix
SI-SNRi24.5
17
Speech SeparationWHAMR! 1CH
SI-SNRi (dB)19.7
11
Speech SeparationWSJ0 4mix
SI-SNRi22.1
9
Speech SeparationWSJ0 5mix
SI-SNRi20.4
8
Speech SeparationWHAMR! 2CH
SI-SNRi (dB)21.8
6
Showing 8 of 8 rows

Other info

Follow for update