Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

About

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framework centered on hard sample classification. The core idea is that a model capable of distinguishing challenging hard samples is inherently equipped to handle simpler cases effectively. We investigate multiple reconstruction paradigms, identifying the diffusion-based method as optimal for generating hard samples. Furthermore, we leverage multi-layer feature aggregation and introduce a Regularization-Assisted Contrastive Learning (RACL) objective to enhance generalizability. Experiments demonstrate the superior generalization of our approach, with our best model achieving a significant reduction in the average Equal Error Rate (EER) compared to the baseline.

Bo Cheng, Songjun Cao, Xiaoming Zhang, Jie Chen, Long Ma, Fei Chen• 2026

Related benchmarks

TaskDatasetResultRank
Audio Deepfake DetectionCodecFake
EER20.198
50
Audio Deepfake DetectionITW In-the-Wild
EER9.155
16
Audio anti-spoofingWaveFake
EER1.597
15
Audio Deepfake DetectionASVspoof LA 2019 (eval)
EER0.206
8
Audio Deepfake DetectionDiffSSD
EER10.081
7
Showing 5 of 5 rows

Other info

Follow for update