State Space Models for Bioacoustics: A Comparative Evaluation with Transformers

About

In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and detection tasks. Compared to the state-of-the-art Transformer-based model (AVES), BioMamba achieves comparable performance while significantly reducing VRAM consumption. Our results demonstrate Mamba's potential as a computationally efficient alternative for real-world environmental monitoring.

Chengyu Tang, Sanjeev Baskiyar• 2025

Related benchmarks

Task	Dataset	Result	Rank
Detection	Beans	dcase0.426		7
Classification	Beans	Accuracy (bats)72.5		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord