Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

About

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framework centered on hard sample classification. The core idea is that a model capable of distinguishing challenging hard samples is inherently equipped to handle simpler cases effectively. We investigate multiple reconstruction paradigms, identifying the diffusion-based method as optimal for generating hard samples. Furthermore, we leverage multi-layer feature aggregation and introduce a Regularization-Assisted Contrastive Learning (RACL) objective to enhance generalizability. Experiments demonstrate the superior generalization of our approach, with our best model achieving a significant reduction in the average Equal Error Rate (EER) compared to the baseline.

Bo Cheng, Songjun Cao, Xiaoming Zhang, Jie Chen, Long Ma, Fei Chen• 2026

Related benchmarks

Task	Dataset	Result
Audio Deepfake Detection	ITW In-the-Wild	EER9.155	51
Audio Deepfake Detection	CodecFake	EER20.198	50
Audio Deepfake Detection	ASVspoof LA 2019 (eval)	EER0.206	36
Audio anti-spoofing	WaveFake	EER1.597	15
Audio Deepfake Detection	DiffSSD	EER10.081	7

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord