Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration

About

Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR.

Ziqi Liang, Zhijun Jia, Chang Liu, Minghui Yang, Zhihong Lu, Jian Wang• 2026

Related benchmarks

TaskDatasetResultRank
Speech RestorationVCTK EN (test)
DNSMOS3.75
5
Speech RestorationAISHELL-3 ZH (test)
DNSMOS3.52
5
Speech RestorationJSUT JP (test)
DNSMOS3.57
5
Bandwidth extensionVCTK
CSIG3.6
4
DenoisingVCTK
CSIG3.48
4
DereverberationVCTK
CSIG3.11
3
Showing 6 of 6 rows

Other info

Follow for update