Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution

About

The challenge of tracing the source attribution of forged faces has gained significant attention due to the rapid advancement of generative models. However, existing deepfake attribution (DFA) works primarily focus on the interaction among various domains in vision modality, and other modalities such as texts and face parsing are not fully explored. Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen advanced generators like diffusion in a fine-grained manner. In this paper, we propose a novel parsing-aware vision language model with a dynamic contrastive learning (PVLM) method for zero-shot deepfake attribution (ZSDFA), which facilitates effective and fine-grained traceability to unseen advanced generators. Specifically, we conduct a novel and fine-grained ZS-DFA benchmark to evaluate the attribution performance of deepfake attributors to unseen advanced generators like diffusion. Besides, we propose an innovative PVLM attributor based on the vision-language model to capture general and diverse attribution features. We are motivated by the observation that the preservation of source face attributes in facial images generated by GAN and diffusion models varies significantly. We propose to employ the inherent facial attributes preservation differences to capture face parsing-aware forgery representations. Therefore, we devise a novel parsing encoder to focus on global face attribute embeddings, enabling parsing-guided DFA representation learning via dynamic vision-parsing matching. Additionally, we present a novel deepfake attribution contrastive center loss to pull relevant generators closer and push irrelevant ones away, which can be introduced into DFA models to enhance traceability. Experimental results show that our model exceeds the state-of-the-art on the ZS-DFA benchmark via various protocol evaluations.

Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao• 2025

Related benchmarks

TaskDatasetResultRank
Deepfake AttributionDF40 and FFHQ unseen generators
SimSwap Accuracy72.05
54
AttributionWildDeepfake
Accuracy68.56
34
Deepfake AttributionLivePortrait unseen
Accuracy78.08
20
Deepfake AttributionLIA unseen
Accuracy82.56
20
Deepfake AttributionFSRT unseen
Accuracy (%)86.88
20
Deepfake AttributionDiffusionAct unseen
Accuracy83.6
20
Deepfake AttributionEDTalk unseen
Accuracy88.4
20
Deepfake AttributionAniTalker unseen
Accuracy74.88
20
Deepfake AttributionReal3DPortrait unseen
Accuracy92.32
20
Deepfake AttributionFLOAT unseen
Accuracy88.48
20
Showing 10 of 18 rows

Other info

Follow for update