Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

About

Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.

Jun Zhou, Bingwen Hu, Yaxiong Wang, Zhedong Zheng, Yongzhen Wang, Yuchen Zhang, Ping Liu• 2026

Related benchmarks

TaskDatasetResultRank
Reference-grounded News VerificationBBC
Classification Accuracy (ACCcls)97.09
20
Reference-grounded News VerificationGuardian
Classification Accuracy97.7
20
Reference-grounded News VerificationUSA Today
ACC (Classification)97.25
20
Reference-grounded News VerificationWash. Post
Classification Accuracy (ACCcls)97.53
20
Binary ClassificationDGM4
AUC97.82
9
Image GroundingDGM4
IoUm85.51
9
Multi-Label ClassificationDGM4
mAP91.37
9
Text GroundingDGM4
Precision79.31
9
Binary ClassificationSAMM (original image reference gallery)
AUC99.83
7
Manipulation detectionMDSM Guardian
Accuracy78.52
7
Showing 10 of 24 rows

Other info

Follow for update