Universal Speech Enhancement with Regression and Generative Mamba

About

The Interspeech 2025 URGENT Challenge aimed to advance universal, robust, and generalizable speech enhancement by unifying speech enhancement tasks across a wide variety of conditions, including seven different distortion types and five languages. We present Universal Speech Enhancement Mamba (USEMamba), a state-space speech enhancement model designed to handle long-range sequence modeling, time-frequency structured processing, and sampling frequency-independent feature extraction. Our approach primarily relies on regression-based modeling, which performs well across most distortions. However, for packet loss and bandwidth extension, where missing content must be inferred, a generative variant of the proposed USEMamba proves more effective. Despite being trained on only a subset of the full training data, USEMamba achieved 2nd place in Track 1 during the blind test phase, demonstrating strong generalization across diverse conditions.

Rong Chao, Rauf Nasretdinov, Yu-Chiang Frank Wang, Ante Juki\'c, Szu-Wei Fu, Yu Tsao• 2025

Related benchmarks

Task	Dataset	Result
Universal Speech Enhancement	URGENT non-blind 2025 (test)	DNSMOS3.01	26
Speech Enhancement	URGENT Challenge 2025 (non-blind test)	DNSMOS3.01	19
General Speech Restoration	DNS-Real Out-Domain (test)	SIG3.239	17
General Speech Restoration	URGENT 2025 (test)	UTMOS1.88	14
Speech Restoration	CCF-AATC Challenge 2025 (test)	SIG3.36	7
General Speech Restoration	URGENT 2025 (val)	SCOREQ1.77	7
General Speech Restoration	VCTK-GSR (test)	SCOREQ1.87	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord