Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

About

Image Deepfake Detection (IDD) separates manipulated images from authentic ones by spotting artifacts of synthesis or tampering. Although large vision-language models (LVLMs) offer strong image understanding, adapting them to IDD often demands costly fine-tuning and generalizes poorly to diverse, evolving manipulations. We propose the Semantic Consistent Evidence Pack (SCEP), a training-free LVLM framework that replaces whole-image inference with evidence-driven reasoning. SCEP mines a compact set of suspicious patch tokens that best reveal manipulation cues. It uses the vision encoder's CLS token as a global reference, clusters patch features into coherent groups, and scores patches with a fused metric combining CLS-guided semantic mismatch with frequency-and noise-based anomalies. To cover dispersed traces and avoid redundancy, SCEP samples a few high-confidence patches per cluster and applies grid-based NMS, producing an evidence pack that conditions a frozen LVLM for prediction. Experiments on diverse benchmarks show SCEP outperforms strong baselines without LVLM fine-tuning.

Yuxin Liu, Fei Wang, Kun Li, Yiqi Nie, Junjie Chen, Zhangling Duan, Zhaohong Jia• 2026

Related benchmarks

TaskDatasetResultRank
Image Deepfake DetectionDFBench Overall (Full partition)
Accuracy54.44
9
Image Deepfake DetectionLIVE
Accuracy91.74
9
Image Deepfake DetectionCSIQ
Accuracy90.03
9
Image Deepfake DetectionTID 2013
Accuracy88.32
9
Image Deepfake DetectionKADID
Accuracy89.92
9
Image Deepfake DetectionDFBench AI-Edited (test)
Object Enhance Accuracy57.32
9
Image Deepfake DetectionKonIQ-10k
Accuracy85.32
9
Image Deepfake DetectionDFBench Playground AI-generated 1.0
Accuracy43.33
8
Image Deepfake DetectionDFBench SD3.5 Large (AI-generated)
Accuracy36.41
8
Showing 9 of 9 rows

Other info

Follow for update