SepMamba: State-space models for speaker separation using Mamba

About

Deep learning-based single-channel speaker separation has improved significantly in recent years largely due to the introduction of the transformer-based attention mechanism. However, these improvements come at the expense of intense computational demands, precluding their use in many practical applications. As a computationally efficient alternative with similar modeling capabilities, Mamba was recently introduced. We propose SepMamba, a U-Net-based architecture composed primarily of bidirectional Mamba layers. We find that our approach outperforms similarly-sized prominent models - including transformer-based models - on the WSJ0 2-speaker dataset while enjoying a significant reduction in computational cost, memory usage, and forward pass time. We additionally report strong results for causal variants of SepMamba. Our approach provides a computationally favorable alternative to transformer-based architectures for deep speech separation.

Thor H{\o}jhus Avenstrup, Boldizs\'ar Elek, Istv\'an L\'aszl\'o M\'adi, Andr\'as Bence Schin, Morten M{\o}rup, Bj{\o}rn Sand Jensen, Kenny Falk{\ae}r Olsen• 2024

Related benchmarks

Task	Dataset	Result	Rank
Speech Separation	WSJ0-2Mix (test)	SDRi (dB)22.9		160

Showing 1 of 1 rows

Other info

Code

Follow for update

@wizwand_team Discord