Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Conditional Diffusion Probabilistic Model for Speech Enhancement

About

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.

Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao• 2022

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank + DEMAND (VB-DMD) (test)
PESQ2.52
105
Speech EnhancementVoiceBank-DEMAND (test)
PESQ2.52
96
Automatic Speech RecognitionATC Corpus
CER (DS2)10.45
27
Speech EnhancementATC Corpus
CSIG3.7
19
Speech EnhancementATC Corpus (selected samples)
MOS SIG3.55
18
Speech EnhancementWSJ0 UNI
PESQ1.97
15
Speech DenoisingVBDMD (test)
PESQ2.48
12
Speech EnhancementURGENT 2024 (test)
PESQ2.4
12
Speech EnhancementURGENT Speech Enhancement Challenge 50-sample 2024 (test)
MOS2.84
12
Speech Super-resolutionVBDMD-SR (test)
PESQ2.66
10
Showing 10 of 15 rows

Other info

Code

Follow for update