Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization

About

Existing Multimodal Large Language Models (MLLMs) for image forgery detection and localization predominantly operate under a text-centric Chain-of-Thought (CoT) paradigm. However, forcing these models to textually characterize imperceptible low-level tampering traces inevitably leads to hallucinations, as linguistic modalities are insufficient to capture such fine-grained pixel-level inconsistencies. To overcome this, we propose ForgeryVCR, a framework that incorporates a forensic toolbox to materialize imperceptible traces into explicit visual intermediates via Visual-Centric Reasoning. To enable efficient tool utilization, we introduce a Strategic Tool Learning post-training paradigm, encompassing gain-driven trajectory construction for Supervised Fine-Tuning (SFT) and subsequent Reinforcement Learning (RL) optimization guided by a tool utility reward. This paradigm empowers the MLLM to act as a proactive decision-maker, learning to spontaneously invoke multi-view reasoning paths including local zoom-in for fine-grained inspection and the analysis of invisible inconsistencies in compression history, noise residuals, and frequency domains. Extensive experiments reveal that ForgeryVCR achieves state-of-the-art (SOTA) performance in both detection and localization tasks, demonstrating superior generalization and robustness with minimal tool redundancy. The project page is available at https://youqiwong.github.io/projects/ForgeryVCR/.

Youqi Wang, Shen Chen, Haowei Wang, Rongxuan Peng, Taiping Yao, Shunquan Tan, Changsheng Chen, Bin Li, Shouhong Ding• 2026

Related benchmarks

TaskDatasetResultRank
Image Forgery DetectionCocoGlide--
15
Image-level Forgery DetectionCASIA v1
F1 Score91.93
11
Image-level Forgery DetectionCoverage
F1 Score79.63
11
Image-level Forgery DetectionNIST16
F1 Score72.71
11
Image-level Forgery DetectionWeighted Avg
F1 Score82.71
11
Pixel-level Forgery LocalizationCoverage
F1 Score67.32
11
Pixel-level Forgery LocalizationNIST 16
F1 Score0.5001
11
Pixel-level Forgery Localizationin the wild
F1 Score69.18
11
Image-level Forgery DetectionDSO
F184.97
11
Pixel-level Forgery LocalizationCASIA v1
F1 Score70.92
11
Showing 10 of 30 rows

Other info

Follow for update