Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Revisiting Image Manipulation Localization under Realistic Manipulation Scenarios

About

With the large models easing the labor-intensive manipulation process, image manipulations in today's real scenarios often entail a complex manipulation process, comprising a series of editing operations to create a deceptive image. However, existing IML methods remain manipulation-process-agnostic, directly producing localization masks in a one-shot prediction paradigm without modeling the underlying editing steps. This one-shot paradigm compresses the high-dimensional compositional space into a single binary mask, inducing severe dimensional collapse, which forces the model to discard essential structural cues and ultimately leads to overfitting and degraded generalization. To address this, we are the first to reformulate image manipulation localization as a conditional sequence prediction task, proposing the RITA framework. RITA predicts manipulated regions layer-by-layer in an ordered manner, using each step's prediction as the condition for the next, thereby explicitly modeling temporal dependencies and hierarchical structures among editing operations. To enable training and evaluation, we synthesize multi-step manipulation data and construct a new benchmark HSIM. We further propose the HSS metric to assess sequential order and hierarchical alignment. Extensive experiments show that: 1) RITA achieves SOTA generalization and robustness on traditional benchmarks; 2) it remains computationally efficient despite explicitly modeling multi-step sequences; and 3) it establishes a viable foundation for hierarchical, process-aware manipulation localization. Code and dataset are available at https://github.com/scu-zjz/RITA.

Xuekang Zhu, Ji-Zhe Zhou, Kaiwen Feng, Chenfan Qu, Xiwen Wang, Yunfei Wang, Liting Zhou, Jian Liu• 2025

Related benchmarks

TaskDatasetResultRank
Image Manipulation LocalizationCAT-Net evaluation protocol (test)
Mean Binary F164.3
84
Image Manipulation LocalizationCoverage
F1 Score56.6
49
Image Manipulation LocalizationCAT-Net (test)
Mean Binary F164.3
42
Image Manipulation LocalizationColumbia
F1 Score92.1
42
Image Manipulation LocalizationCASIA v1
F1 Score77
36
Image Manipulation LocalizationCocoGlide
F1 Score53.3
12
Image Manipulation LocalizationAutoSplice
F1 Score66.4
12
Image Manipulation LocalizationIMD 2020
F1 Score37.9
6
Image Manipulation LocalizationHSIM (test)
Parameters (M)55.567
6
Showing 9 of 9 rows

Other info

Follow for update