Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

About

Generative models have enabled the creation of highly realistic facial-synthetic images, raising significant concerns due to their potential for misuse. Despite rapid advancements in the field of deepfake detection, developing efficient approaches to leverage foundation models for improved generalizability to unseen forgery samples remains challenging. To address this challenge, we propose a novel side-network-based decoder that extracts spatial and temporal cues using the CLIP image encoder for generalized video-based Deepfake detection. Additionally, we introduce Facial Component Guidance (FCG) to enhance spatial learning generalizability by encouraging the model to focus on key facial regions. By leveraging the generic features of a vision-language foundation model, our approach demonstrates promising generalizability on challenging Deepfake datasets while also exhibiting superiority in training data efficiency, parameter efficiency, and model robustness.

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, Jun-Cheng Chen• 2024

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDC	AUC81.8	230
Deepfake Detection	DFD	AUC0.928	193
Deepfake Detection	CelebDF v2	AUC0.95	134
Deepfake Detection	CDF v2	AUC0.6852	97
Face Forgery Detection	DFDC	AUC81.81	74
Deepfake Detection	CDFv1, CDFv2, DFD, DFDCP, DFDC (test)	Overall Average Score88.4	74
Deepfake Detection	FaceForensics++ (test)	AUC76.08	65
Image Deepfake Detection	DFo	AUC0.7571	62
Deepfake Detection	DFDCP (test)	AUC90.04	56
Deepfake Detection	WDF	AUC0.875	54

Showing 10 of 47 rows

Other info

Follow for update

@wizwand_team Discord