Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Universal Score-based Speech Enhancement with High Content Preservation

About

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics.

Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu• 2024

Related benchmarks

TaskDatasetResultRank
Speech EnhancementWSJ0 UNI
PESQ2.66
15
Speech EnhancementURGENT Speech Enhancement Challenge 50-sample 2024 (test)
MOS3.73
12
Speech EnhancementURGENT 2024 (test)
PESQ3.09
12
Speech DenoisingVBDMD (test)
PESQ3.03
12
Speech Super-resolutionVBDMD-SR (test)
PESQ3.01
10
General Speech RestorationDNS-Real Out-Domain (test)
SIG2.999
9
Speech EnhancementWSJ0-CHiME3 Out-Domain (test)
PESQ1.32
7
General Speech RestorationVoicefixer-GSR In-Domain (test)
SIG3.275
7
General Speech RestorationDNS-with-Reverb Out-Domain (test)
SIG2.548
7
Speech EnhancementVB-Demand In-Domain (test)
PESQ3.02
6
Showing 10 of 24 rows

Other info

Follow for update