RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

About

Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets.

Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan• 2024

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	ASVspoof DF 2021	EER15.85	47
Audio Deepfake Detection	ASVspoof LA 2021	EER2.84	41
Spoofing Attack Detection	ASVspoof LA 2021	EER3.21	19
Spoofing Attack Detection	ASVspoof DF 2021	EER15.85	18

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord