AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection

About

The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art performance with an average EER of 5.45\%, including a significant reduction to 1.23\% on WaveFake and 2.70\% on In-the-Wild, all while eliminating the dependency on pre-collected fake datasets. Our code is publicly available at https://github.com/NguyenLeHaiSonGit/AFSS.

Hai-Son Nguyen-Le, Hung-Cuong Nguyen-Thanh, Nhien-An Le-Khac, Dinh-Thuc Nguyen, Hong-Hanh Nguyen-Le• 2026

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	in the wild	EER2.7	65
Spoof Speech Detection	ASVspoof LA 2021 (eval)	--	36
Synthetic Speech Detection	ASVspoof DF 2021 (eval)	--	25
Audio anti-spoofing	WaveFake	EER1.23	15
Audio anti-spoofing	ASVspoof DF 2021 (eval)	EER2.19	8
Audio anti-spoofing	ASVspoof LA 2021 (eval)	EER0.1002	8
Audio anti-spoofing	ASVspoof DF 2021 (hidden)	EER6.92	7
Audio anti-spoofing	ASVspoof LA 2021 (hidden)	EER12.35	7
Audio anti-spoofing	ASVspoof 2019 (eval)	EER2.72	6
Synthetic Speech Detection	ASVspoof LA hidden 2021	AUC94.4	5

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord