Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

About

With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a serious threat to the Automatic Speaker Verification (ASV) system. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Attentive classifier for audio deepfake detection. Our method exploits the WavLM model to extract features that are more conducive to spoofing detection for the first time. Then, we propose a novel Multi-Fusion Attentive (MFA) classifier based on the Attentive Statistics Pooling (ASP) layer. The MFA captures the complementary information of audio features at both time and layer levels. Experiments demonstrate that our methods achieve state-of-the-art results on the ASVspoof 2021 DF set and provide competitive results on the ASVspoof 2019 and 2021 LA set.

Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, Yuehai Wang• 2023

Related benchmarks

TaskDatasetResultRank
Spoof Speech DetectionASVspoof LA 2021 (eval)--
36
Audio Deepfake DetectionASVspoof DF 2021
EER2.56
35
Audio Deepfake DetectionASVspoof 2021
EER2.56
27
Synthetic Speech DetectionASVspoof DF 2021 (eval)
EER (%)2.56
19
Audio Deepfake DetectionASVspoof LA and DF 2021
EER (DF)2.56
17
Deepfake Audio DetectionASVspoof LA 2019
EER (%)42
12
Audio Deepfake DetectionASVspoof LA 2021
EER5.08
12
Audio Deepfake DetectionASVspoof LA 2019
EER42
11
Spoofing Attack DetectionASVspoof LA 2021
EER5.08
9
Spoofing Attack DetectionASVspoof DF 2021
EER2.56
8
Showing 10 of 11 rows

Other info

Follow for update