Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

About

Current Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in understanding multimodal data, but their potential remains underexplored for deepfake detection due to the misalignment of their knowledge and forensics patterns. To this end, we present a novel framework that unlocks LVLMs' potential capabilities for deepfake detection. Our framework includes a Knowledge-guided Forgery Detector (KFD), a Forgery Prompt Learner (FPL), and a Large Language Model (LLM). The KFD is used to calculate correlations between image features and pristine/deepfake image description embeddings, enabling forgery classification and localization. The outputs of the KFD are subsequently processed by the Forgery Prompt Learner to construct fine-grained forgery prompt embeddings. These embeddings, along with visual and question prompt embeddings, are fed into the LLM to generate textual detection responses. Extensive experiments on multiple benchmarks, including FF++, CDF2, DFD, DFDCP, DFDC, and DF40, demonstrate that our scheme surpasses state-of-the-art methods in generalization performance, while also supporting multi-turn dialogue capabilities.

Peipeng Yu, Jianwei Fei, Hui Gao, Xuan Feng, Zhihua Xia, Chip Hong Chang• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDC	AUC79.12	230
Deepfake Detection	CDF v2	AUC0.9471	97
Face Forgery Detection	DFDC	--	74
Deepfake Detection	DFDCP	AUC0.9181	35
Video-level Deepfake Detection	DFDC	AUC0.791	34
Face Forgery Detection	Celeb-DF v2	Video-level AUC94.71	33
Video-level Deepfake Detection	DFD	AUC0.996	25
Face Forgery Detection	Celeb-DF v1	Video-level AUC97.62	19
Video-level Deepfake Detection	CDF2	AUC94.7	13
Video-level Deepfake Detection	DFDCP	AUC91.8	12

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord