Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection

About

Transformers and their variants have achieved great success in speech processing. However, their multi-head self-attention mechanism is computationally expensive. Therefore, one novel selective state space model, Mamba, has been proposed as an alternative. Building on its success in automatic speech recognition, we apply Mamba for spoofing attack detection. Mamba is well-suited for this task as it can capture the artifacts in spoofed speech signals by handling long-length sequences. However, Mamba's performance may suffer when it is trained with limited labeled data. To mitigate this, we propose combining a new structure of Mamba based on a dual-column architecture with self-supervised learning, using the pre-trained wav2vec 2.0 model. The experiments show that our proposed approach achieves competitive results and faster inference on the ASVspoof 2021 LA and DF datasets, and on the more challenging In-the-Wild dataset, it emerges as the strongest candidate for spoofing attack detection. The code has been publicly released in https://github.com/swagshaw/XLSR-Mamba.

Yang Xiao, Rohan Kumar Das• 2024

Related benchmarks

TaskDatasetResultRank
Audio Deepfake Detectionin the wild
EER6.7
58
Spoof Speech DetectionASVspoof LA 2021 (eval)--
36
Audio Deepfake DetectionASVspoof DF 2021
EER1.88
35
Audio Deepfake DetectionASVspoof LA 2021
EER0.93
23
Synthetic Speech DetectionASVspoof DF 2021 (eval)
EER (%)1.88
19
Speech Spoofing DetectionIn-the-Wild (ITW) (eval)
EER6.71
19
Audio Deepfake DetectionASVspoof LA and DF 2021
EER (DF)1.88
17
Audio Deepfake DetectionASVspoof LA 2021
EER0.93
12
Audio Deepfake DetectionASVspoof LA 2019
EER42.1
11
Spoofing Attack DetectionASVspoof LA 2021
EER0.93
9
Showing 10 of 30 rows

Other info

Code

Follow for update