Revisiting Image Manipulation Localization under Realistic Manipulation Scenarios

About

With the large models easing the labor-intensive manipulation process, image manipulations in today's real scenarios often entail a complex manipulation process, comprising a series of editing operations to create a deceptive image. However, existing IML methods remain manipulation-process-agnostic, directly producing localization masks in a one-shot prediction paradigm without modeling the underlying editing steps. This one-shot paradigm compresses the high-dimensional compositional space into a single binary mask, inducing severe dimensional collapse, which forces the model to discard essential structural cues and ultimately leads to overfitting and degraded generalization. To address this, we are the first to reformulate image manipulation localization as a conditional sequence prediction task, proposing the RITA framework. RITA predicts manipulated regions layer-by-layer in an ordered manner, using each step's prediction as the condition for the next, thereby explicitly modeling temporal dependencies and hierarchical structures among editing operations. To enable training and evaluation, we synthesize multi-step manipulation data and construct a new benchmark HSIM. We further propose the HSS metric to assess sequential order and hierarchical alignment. Extensive experiments show that: 1) RITA achieves SOTA generalization and robustness on traditional benchmarks; 2) it remains computationally efficient despite explicitly modeling multi-step sequences; and 3) it establishes a viable foundation for hierarchical, process-aware manipulation localization. Code and dataset are available at https://github.com/scu-zjz/RITA.

Xuekang Zhu, Ji-Zhe Zhou, Kaiwen Feng, Chenfan Qu, Xiwen Wang, Yunfei Wang, Liting Zhou, Jian Liu• 2025

Related benchmarks

Task	Dataset	Result
Image Manipulation Localization	CAT-Net evaluation protocol (test)	Mean Binary F164.3	84
Image Manipulation Localization	Coverage	F1 Score56.6	60
Image Manipulation Localization	CAT-Net (test)	Mean Binary F164.3	42
Image Manipulation Localization	Columbia	F1 Score92.1	42
Image Manipulation Localization	CASIA v1	F1 Score77	36
Image Manipulation Localization	CocoGlide	F1 Score53.3	24
Image Manipulation Localization	AutoSplice	F1 Score66.4	24
Image Manipulation Localization	IMD 2020	F1 Score37.9	6
Image Manipulation Localization	HSIM (test)	Parameters (M)55.567	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord