State Space Models for Bioacoustics: A Comparative Evaluation with Transformers
About
In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and detection tasks. Compared to the state-of-the-art Transformer-based model (AVES), BioMamba achieves comparable performance while significantly reducing VRAM consumption. Our results demonstrate Mamba's potential as a computationally efficient alternative for real-world environmental monitoring.
Chengyu Tang, Sanjeev Baskiyar• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Detection | Beans | dcase0.426 | 7 | |
| Classification | Beans | Accuracy (bats)72.5 | 7 |
Showing 2 of 2 rows