AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection
About
The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art performance with an average EER of 5.45\%, including a significant reduction to 1.23\% on WaveFake and 2.70\% on In-the-Wild, all while eliminating the dependency on pre-collected fake datasets. Our code is publicly available at https://github.com/NguyenLeHaiSonGit/AFSS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Audio Deepfake Detection | in the wild | EER2.7 | 64 | |
| Spoof Speech Detection | ASVspoof LA 2021 (eval) | -- | 36 | |
| Synthetic Speech Detection | ASVspoof DF 2021 (eval) | -- | 25 | |
| Audio anti-spoofing | WaveFake | EER1.23 | 8 | |
| Audio anti-spoofing | ASVspoof DF 2021 (eval) | EER2.19 | 8 | |
| Audio anti-spoofing | ASVspoof LA 2021 (eval) | EER0.1002 | 8 | |
| Audio anti-spoofing | ASVspoof DF 2021 (hidden) | EER6.92 | 7 | |
| Audio anti-spoofing | ASVspoof LA 2021 (hidden) | EER12.35 | 7 | |
| Audio anti-spoofing | ASVspoof 2019 (eval) | EER2.72 | 6 | |
| Synthetic Speech Detection | ASVspoof LA hidden 2021 | AUC94.4 | 5 |