PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution

About

The challenge of tracing the source attribution of forged faces has gained significant attention due to the rapid advancement of generative models. However, existing deepfake attribution (DFA) works primarily focus on the interaction among various domains in vision modality, and other modalities such as texts and face parsing are not fully explored. Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen advanced generators like diffusion in a fine-grained manner. In this paper, we propose a novel parsing-aware vision language model with a dynamic contrastive learning (PVLM) method for zero-shot deepfake attribution (ZSDFA), which facilitates effective and fine-grained traceability to unseen advanced generators. Specifically, we conduct a novel and fine-grained ZS-DFA benchmark to evaluate the attribution performance of deepfake attributors to unseen advanced generators like diffusion. Besides, we propose an innovative PVLM attributor based on the vision-language model to capture general and diverse attribution features. We are motivated by the observation that the preservation of source face attributes in facial images generated by GAN and diffusion models varies significantly. We propose to employ the inherent facial attributes preservation differences to capture face parsing-aware forgery representations. Therefore, we devise a novel parsing encoder to focus on global face attribute embeddings, enabling parsing-guided DFA representation learning via dynamic vision-parsing matching. Additionally, we present a novel deepfake attribution contrastive center loss to pull relevant generators closer and push irrelevant ones away, which can be introduced into DFA models to enhance traceability. Experimental results show that our model exceeds the state-of-the-art on the ZS-DFA benchmark via various protocol evaluations.

Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao• 2025

Related benchmarks

Task	Dataset	Result
Deepfake Attribution	DF40 and FFHQ unseen generators	SimSwap Accuracy72.05	54
Attribution	WildDeepfake	Accuracy68.56	34
Deepfake Attribution	LivePortrait unseen	Accuracy78.08	20
Deepfake Attribution	LIA unseen	Accuracy82.56	20
Deepfake Attribution	FSRT unseen	Accuracy (%)86.88	20
Deepfake Attribution	DiffusionAct unseen	Accuracy83.6	20
Deepfake Attribution	EDTalk unseen	Accuracy88.4	20
Deepfake Attribution	AniTalker unseen	Accuracy74.88	20
Deepfake Attribution	Real3DPortrait unseen	Accuracy92.32	20
Deepfake Attribution	FLOAT unseen	Accuracy88.48	20

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord