Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

About

Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to estimate the directions-of-arrival (DOAs) of speakers. The second algorithm extends the generalized cross-correlation - phase transform (GCC-PHAT) method by using redundant information of multiple microphones to address the reverberation problem. The proposed methods have been evaluated under adverse conditions using not only simulated signals (reverberation time $T_{60}$ of up to $1$s) but also recordings in a real reverberant room ($T_{60} \approx 0.65$s). Comparing with some state-of-the-art localization methods, experimental results confirm that the proposed methods can reliably locate static and moving speakers, in presence of reverberation.

Shoufeng Lin• 2026

Related benchmarks

TaskDatasetResultRank
DOA estimationSimulated DOA Data SNR=∞, T60=0.2s
RMSE0.82
5
DOA estimationSimulated DOA Data SNR=∞, T60=0.4s
RMSE1
5
DOA estimationSimulated DOA Data SNR=∞, T60=0.6s
RMSE1
5
DOA estimationSimulated DOA Data SNR=40dB, T60=0.2s
RMSE0.82
5
DOA estimationSimulated DOA Data (SNR=40dB, T60=0.6s)
RMSE1.29
5
DOA estimationSimulated DOA Data SNR=∞, T60=1s
RMSE1
4
DOA estimationSimulated DOA Data SNR=40dB, T60=0.4s
RMSE1
4
DOA estimationSimulated DOA Data SNR=40dB, T60=1s
RMSE0.82
4
Showing 8 of 8 rows

Other info

Follow for update