Universal Score-based Speech Enhancement with High Content Preservation

About

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics.

Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu• 2024

Related benchmarks

Task	Dataset	Result
Speech Enhancement	VoiceBank-DEMAND	PESQ3.02	55
General Speech Restoration	DNS-Real Out-Domain (test)	SIG3.037	17
Speech Enhancement	WSJ0 UNI	PESQ2.66	15
Speech Enhancement	VB-DMD	DNSMOS3.45	15
Speech Restoration	DNS Challenge real-recording 2020	DNSMOS Score2.64	14
General Speech Restoration	URGENT 2025 (test)	UTMOS1.88	14
Speech Enhancement	VB-Demand In-Domain (test)	PESQ3.02	13
Speech Enhancement	URGENT Speech Enhancement Challenge 50-sample 2024 (test)	MOS3.73	12
Speech Enhancement	URGENT 2024 (test)	PESQ3.09	12
Speech Denoising	VBDMD (test)	PESQ3.03	12

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord