A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification

About

Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, respectively. The reconstruction module serves as a regularization term to constrain the noise. A feature-robust loss is also used to supervise the speaker encoder to learn noise-independent speaker embeddings without losing speaker information. In addition, adversarial training is introduced to discourage the speaker encoder from encoding acoustic condition information for achieving a speaker-invariant embedding space. Experiments on VoxCeleb1 indicate that the proposed method improves the performance of the speaker verification system under both clean and noisy conditions.

Xujiang Xing, Mingxing Xu, Thomas Fang Zheng• 2024

Related benchmarks

Task	Dataset	Result
Speaker Verification	VoxCeleb1 with MUSAN noise (test)	EER2.63	187
Speaker Verification	VoxCeleb1-O Cleaned (Original)	EER (%)2.63	61
Speaker Verification	VoxCeleb1 with Nonspeech100 (test)	EER (%)2.99	36
Speaker Verification	Vox1-O Noise (test)	Error Rate (0 dB)5.87	18
Speaker Verification	Vox1 Overall O (test)	Average EER4.67	9
Speaker Verification	Vox1 Music O (test)	Error Rate (0 dB SNR)7.07	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord