Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

About

Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enhances detection performance. However, most of the previously proposed fusion methods require fine-tuning the pretrained models, resulting in excessively long training times and hindering model iteration when facing new speech synthesis technology. To address this issue, this paper proposes a feature fusion method based on the Mixture of Experts, which extracts and integrates features relevant to fake audio detection from layer features, guided by a gating network based on the last layer feature, while freezing the pretrained model. Experiments conducted on the ASVspoof2019 and ASVspoof2021 datasets demonstrate that the proposed method achieves competitive performance compared to those requiring fine-tuning.

Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xiaopeng Wang, Yuankun Xie, Xin Qi, Shuchen Shi, Yi Lu, Yukun Liu, Chenxing Li, Xuefei Liu, Guanjun Li• 2024

Related benchmarks

TaskDatasetResultRank
Audio Deepfake Detectionin the wild
EER12.48
58
Spoof Speech DetectionASVspoof LA 2021 (eval)--
36
Audio Deepfake DetectionASVspoof DF 2021
EER2.54
35
Synthetic Speech DetectionASVspoof DF 2021 (eval)
EER (%)2.54
19
Speech Spoofing DetectionIn-the-Wild (ITW) (eval)
EER9.17
19
Audio Deepfake DetectionASVspoof LA and DF 2021
EER (DF)2.54
17
Audio Deepfake DetectionASVspoof LA 2021
EER2.96
12
Deepfake Audio DetectionASVspoof LA 2019
EER (%)74
12
Audio Deepfake DetectionASVspoof LA 2019
EER74
11
Voice Anti-spoofingin-the-wild (test)
EER9.17
7
Showing 10 of 10 rows

Other info

Follow for update