Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An Investigation of Incorporating Mamba for Speech Enhancement

About

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ~12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.

Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao• 2024

Related benchmarks

TaskDatasetResultRank
Speech EnhancementVoiceBank + DEMAND (VB-DMD) (test)
PESQ3.69
105
Speech EnhancementVCTK+DEMAND (test)
WB-PESQ3.52
13
Phase RetrievalVoiceBank Corpus (test)
PESQ4.59
8
Speech DenoisingVoiceBank+DEMAND (test)
PESQ3.564
7
Speech DereverberationWSJ0+WHAMR! (test)
WB-PESQ3.577
5
Composite Denoising and DereverberationWSJ0+WHAMR! (test)
WB-PESQ2.372
5
Speech DenoisingWSJ0+WHAMR! (test)
WB-PESQ2.658
5
Composite Denoising, Dereverberation, and Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ2.066
5
Speech Bandwidth ExtensionWSJ0+WHAMR! (test)
WB-PESQ3.305
5
Speech DenoisingDNS Non-Reverberant 2020 (test)
PESQ2.44
5
Showing 10 of 10 rows

Other info

Code

Follow for update