Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models

About

Bridge models have been investigated in speech enhancement but are mostly single-task, with constrained general speech restoration (GSR) capability. In this work, we propose VoiceBridge, a one-step latent bridge model (LBM) for GSR, capable of efficiently reconstructing 48 kHz fullband speech from diverse distortions. To inherit the advantages of data-domain bridge models, we design an energy-preserving variational autoencoder, enhancing the waveform-latent space alignment over varying energy levels. By compressing waveform into continuous latent representations, VoiceBridge models~\textit{various} GSR tasks with a~\textit{single} latent-to-latent generative process backed by a scalable transformer. To alleviate the challenge of reconstructing the high-quality target from distinctively different low-quality priors, we propose a joint neural prior for GSR, uniformly reducing the burden of the LBM in diverse tasks. Building upon these designs, we further investigate bridge training objective by jointly tuning LBM, decoder and discriminator together, transforming the model from a denoiser to generator and enabling \textit{one-step GSR without distillation}. Extensive validation across in-domain (\textit{e.g.}, denoising and super-resolution) and out-of-domain tasks (\textit{e.g.}, refining synthesized speech) and datasets demonstrates the superior performance of VoiceBridge. Demos: https://VoiceBridgedemo.github.io/.

Chi Zhang, Kaiwen Zheng, Zehua Chen, Jun Zhu• 2025

Related benchmarks

TaskDatasetResultRank
General Speech RestorationDNS-Real Out-Domain (test)
SIG3.473
9
Speech DenoisingWSJ0-CHiME3 (test)
PESQ1.74
8
Bandwidth extensionVCTK-BWE BW=2K (test)
WVMOS4.306
7
General Speech RestorationVoicefixer-GSR In-Domain (test)
SIG3.494
7
General Speech RestorationDNS-with-Reverb Out-Domain (test)
SIG3.581
7
Bandwidth extensionVCTK-BWE BW=4K (test)
WVMOS4.404
7
Speech EnhancementWSJ0-CHiME3 Out-Domain (test)
PESQ1.742
7
Bandwidth extensionVCTK-BWE BW=1K (test)
WVMOS4.154
6
DereverberationWSJ0-Reverb (test)
WVMOS4.403
6
Speech EnhancementVB-Demand In-Domain (test)
PESQ2.831
6
Showing 10 of 23 rows

Other info

Follow for update