Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization

About

Text-driven image editing has advanced rapidly, but reliably localizing these manipulations requires image manipulation localization (IML) models trained on large pixel-annotated datasets, and there is still no low-cost way to obtain such training data at scale. We observe that these data already exist in disguise: public editing datasets contain millions of structurally identical (original, edited) pairs to IML training samples, lacking only pixel-level masks. Recovering these masks automatically is non-trivial: pixel differencing is overwhelmed by diffusion-induced perturbations across all pixels, and instruction-only grounding localizes only what the prompt describes, missing unintended editor side-effects. We propose SIGMA (Semantic-difference Instruction-Grounding Mask Annotator), which performs semantic-feature differencing in a vision foundation backbone and injects an instruction-derived spatial prior into this visual stream via bidirectional cross-modal refinement, amplifying the difference signal at intended-edit regions when the editor faithfully realizes user intent. SIGMA is trained in two complementary stages: Stage I supervises on inpainting masks; Stage II closes the diffusion-domain shift via VAE-roundtrip noise calibration, EMA self-training, and an edit-noise disentanglement loss. SIGMA outperforms existing automatic mask generators on five benchmarks (+12.20% F1, +11.16% IoU). When applied to public editing corpora, it produces a ~1.1M IML training set that improves six diverse detectors by +18.34% F1 across five datasets, turning previously unused editing data into a model-agnostic supervisory resource for IML. We'll release the full codebase as soon as the paper is accepted.

Peiyu Zhuang, Jianquan Yang, Haodong Li, Zhuoying Cai, Ruitao Xie, Jishen Zeng, Baoying Chen, Jiwu Huang, Xiaochun Cao• 2026

Related benchmarks

TaskDatasetResultRank
Image Manipulation LocalizationAutoSplice
F1 Score63.42
24
Image Manipulation LocalizationCocoGlide
F1 Score0.5872
24
Image Manipulation LocalizationMagicBrush
F1 Score81.95
21
Image Manipulation LocalizationDEAL-300K
F1 Score29.99
12
Image Manipulation LocalizationOpenSDI
F1 Score27.02
12
Change DetectionAutoSplice
F1 Score96.69
10
Change DetectionDEAL-300K
F1 Score76.43
10
Change DetectionOpenSDI
F1 Score91.37
10
Change DetectionCocoGlide
F1 Score95.62
10
Change DetectionMagicBrush
F1 Score89.03
10
Showing 10 of 10 rows

Other info

Follow for update