Generalizable Face Forgery Detection via Separable Prompt Learning

About

Detecting face forgeries using CLIP has recently emerged as a promising and increasingly popular research direction. Owing to its rich visual knowledge acquired through large-scale pretraining, most existing methods typically rely on the visual encoder of CLIP, while paying limited attention to the text modality. Given the instructive nature of the text modality, we posit that it can be leveraged to instruct Deepfake detection with meticulous design. Accordingly, we shift the focus from the visual modality to the text modality and propose a new Separable Prompt Learning strategy (SePL) that enables CLIP to serve as an effective face forgery detector. The core idea of SePL is to disentangle forgery-specific and forgery-irrelevant information in images via two types of prompt learning, with the former enhancing detection. To achieve this disentangle, we describe a cross-modality alignment strategy and a set of dedicated objectives. Extensive experiments demonstrate that, with this simple adaptation, our method achieves competitive and even superior performance compared to other methods under both cross-dataset and cross-method evaluation, highlighting its strong generalizability. The codes have been released at https://github.com/OUC-YER/SePL-DeepfakeDetection

Enrui Yang, Yuezun Li• 2026

Related benchmarks

Task	Dataset	Result
Deepfake Detection	DFDC	AUC86.6	230
Deepfake Detection	DFD	AUC0.972	193
Deepfake Detection	CelebDF v2	AUC0.96	134
Fake Image Detection	UniversalFakeDetect (test)	Pro-GAN Detection Rate100	52
Deepfake Detection	UniFace	AUC98.1	30
Deepfake Detection	BleFace	AUC96	30
Deepfake Detection	MobSwap	AUC98.5	30
Deepfake Detection	FaceDan	AUC97.4	30
Deepfake Detection	InSwap	AUC97.8	30
Deepfake Detection	SimSwap	AUC97.3	30

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord