Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

About

We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectro-temporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition, our proposed framework leverages a convolution-based 2D attention map to focus on specific spectro-temporal regions, enabling robust deepfake detection. Operating directly on raw features, BiCrossMamba-ST achieves significant performance improvements, a 67.74% and 26.3% relative gain over state-of-the-art AASIST on ASVSpoof LA21 and ASVSpoof DF21 benchmarks, respectively, and a 6.80% improvement over RawBMamba on ASVSpoof DF21. Code and models will be made publicly available.

Yassine El Kheir, Tim Polzehl, Sebastian M\"oller• 2025

Related benchmarks

TaskDatasetResultRank
Audio Deepfake Detectionin the wild
EER7.94
58
Audio Deepfake DetectionASVspoof DF 2021
EER2.35
35
Audio Deepfake DetectionASVspoof LA 2021
EER3.39
23
Audio Deepfake DetectionASVspoof LA 2021
EER3.83
12
Audio Deepfake DetectionASVspoof LA 2019
EER71
11
Audio Deepfake DetectionASVspoof 5
EER13.67
9
Audio Deepfake DetectionADD Track 1 2022
F1 Score56.7
7
Audio Deepfake DetectionADD Track 1 2022
EER30.44
7
Audio Deepfake DetectionASVspoof 2024
F1 Score72
7
Audio Deepfake DetectionLibriVoc
F1 Score92.9
7
Showing 10 of 23 rows

Other info

Follow for update