Universal Score-based Speech Enhancement with High Content Preservation
About
We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Enhancement | WSJ0 UNI | PESQ2.66 | 15 | |
| Speech Enhancement | URGENT Speech Enhancement Challenge 50-sample 2024 (test) | MOS3.73 | 12 | |
| Speech Enhancement | URGENT 2024 (test) | PESQ3.09 | 12 | |
| Speech Denoising | VBDMD (test) | PESQ3.03 | 12 | |
| Speech Super-resolution | VBDMD-SR (test) | PESQ3.01 | 10 | |
| General Speech Restoration | DNS-Real Out-Domain (test) | SIG2.999 | 9 | |
| Speech Enhancement | WSJ0-CHiME3 Out-Domain (test) | PESQ1.32 | 7 | |
| General Speech Restoration | Voicefixer-GSR In-Domain (test) | SIG3.275 | 7 | |
| General Speech Restoration | DNS-with-Reverb Out-Domain (test) | SIG2.548 | 7 | |
| Speech Enhancement | VB-Demand In-Domain (test) | PESQ3.02 | 6 |