Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

About

Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.

Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu• 2023

Related benchmarks

Task	Dataset	Result
Speaker Verification	VoxCeleb1 with MUSAN noise (test)	EER2.35	187
Speaker Verification	VoxCeleb1-O Cleaned (Original)	EER (%)2.35	61
Speaker Verification	VoxCeleb1 with Nonspeech100 (test)	EER (%)2.89	36
Speaker Verification	Vox1-O Noise (test)	Error Rate (0 dB)6.01	18
Speaker Verification	Vox1 Music O (test)	Error Rate (0 dB SNR)6.04	9
Speaker Verification	Vox1 Overall O (test)	Average EER4.61	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord