Universal Speech Enhancement with Regression and Generative Mamba
About
The Interspeech 2025 URGENT Challenge aimed to advance universal, robust, and generalizable speech enhancement by unifying speech enhancement tasks across a wide variety of conditions, including seven different distortion types and five languages. We present Universal Speech Enhancement Mamba (USEMamba), a state-space speech enhancement model designed to handle long-range sequence modeling, time-frequency structured processing, and sampling frequency-independent feature extraction. Our approach primarily relies on regression-based modeling, which performs well across most distortions. However, for packet loss and bandwidth extension, where missing content must be inferred, a generative variant of the proposed USEMamba proves more effective. Despite being trained on only a subset of the full training data, USEMamba achieved 2nd place in Track 1 during the blind test phase, demonstrating strong generalization across diverse conditions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech Enhancement | URGENT Challenge 2025 (non-blind test) | DNSMOS3.01 | 19 | |
| General Speech Restoration | DNS-Real Out-Domain (test) | SIG3.239 | 17 | |
| Universal Speech Enhancement | URGENT non-blind 2025 (test) | DNSMOS3.01 | 9 | |
| Speech Restoration | CCF-AATC Challenge 2025 (test) | SIG3.36 | 7 | |
| General Speech Restoration | URGENT 2025 (val) | SCOREQ1.77 | 7 | |
| General Speech Restoration | URGENT 2025 (test) | SCOREQ1.6 | 7 | |
| General Speech Restoration | VCTK-GSR (test) | SCOREQ1.87 | 7 |