Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Schr\"odinger Bridge for Generative Speech Enhancement

About

This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distribution. The model is trained with a data prediction loss, aiming to recover the complex-valued clean speech coefficients, and an auxiliary time-domain loss is used to improve training of the model. The effectiveness of the proposed SB-based model is evaluated in two different speech enhancement tasks: speech denoising and speech dereverberation. The experimental results demonstrate that the proposed SB-based outperforms diffusion-based models in terms of speech quality metrics and ASR performance, e.g., resulting in relative word error rate reduction of 20% for denoising and 6% for dereverberation compared to the best baseline model. The proposed model also demonstrates improved efficiency, achieving better quality than the baselines for the same number of sampling steps and with a reduced computational cost.

Ante Juki\'c, Roman Korostik, Jagadeesh Balam, Boris Ginsburg• 2024

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank-DEMAND (test)
PESQ2.91
96
Speech DereverberationWSJ0-Reverb (test)
PESQ2.68
12
Speech DenoisingWSJ0-CHiME3 (test)
PESQ2.62
8
Speech EnhancementDNS3 (test)
SI-SNR14.959
8
Speech EnhancementVB-Demand In-Domain (test)
PESQ2.14
6
Showing 5 of 5 rows

Other info

Follow for update