Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

About

We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectro-temporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition, our proposed framework leverages a convolution-based 2D attention map to focus on specific spectro-temporal regions, enabling robust deepfake detection. Operating directly on raw features, BiCrossMamba-ST achieves significant performance improvements, a 67.74% and 26.3% relative gain over state-of-the-art AASIST on ASVSpoof LA21 and ASVSpoof DF21 benchmarks, respectively, and a 6.80% improvement over RawBMamba on ASVSpoof DF21. Code and models will be made publicly available.

Yassine El Kheir, Tim Polzehl, Sebastian M\"oller• 2025

Related benchmarks

TaskDatasetResultRank
Audio Deepfake Detectionin the wild
EER7.94
64
Audio Deepfake DetectionASVspoof DF 2021
EER2.35
47
Audio Deepfake DetectionASVspoof LA 2021
EER3.39
41
Audio Deepfake DetectionASVspoof LA 2019
EER71
30
Audio Deepfake DetectionFoR
EER6.85
27
Audio Deepfake DetectionADD Track 1 2022
EER30.44
19
Audio Deepfake DetectionSONAR
EER27.36
19
Audio Deepfake DetectionCodecFake
EER37.7
19
Audio Deepfake DetectionADD Track 3 2022
EER18.69
19
Audio Deepfake DetectionADD 2023 R1
EER29.44
19
Showing 10 of 23 rows

Other info

Follow for update