Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

About

Instruction-based image editing enables precise modifications via natural language prompts, but existing methods face a precision-efficiency tradeoff: fine-tuning demands massive datasets (>10M) and computational resources, while training-free approaches suffer from weak instruction comprehension. We address this by proposing ICEdit, which leverages the inherent comprehension and generation abilities of large-scale Diffusion Transformers (DiTs) through three key innovations: (1) An in-context editing paradigm without architectural modifications; (2) Minimal parameter-efficient fine-tuning for quality improvement; (3) Early Filter Inference-Time Scaling, which uses VLMs to select high-quality noise samples for efficiency. Experiments show that ICEdit achieves state-of-the-art editing performance with only 0.1\% of the training data and 1\% trainable parameters compared to previous methods. Our approach establishes a new paradigm for balancing precision and efficiency in instructional image editing. Codes and demos can be found in https://river-zhang.github.io/ICEdit-gh-pages/.

Zechuan Zhang, Ji Xie, Yu Lu, Zongxin Yang, Yi Yang• 2025

Related benchmarks

TaskDatasetResultRank
Image EditingImgEdit-Bench
Overall Score3.05
132
Image EditingKRIS-Bench
Factual Knowledge Score0.4699
65
Image EditingGEdit-Bench
Semantic Consistency4.94
46
Instruction-based Image EditingImgEdit Bench 1.0 (test)
Add Score3.58
37
Image-to-Image Translation (Appearance Divergence)LAION Mini
Structure Similarity96.8
20
Image-to-Image Translation (Appearance Consistency)LAION Mini
Structure Similarity0.954
20
Document EditingMiLDEBench 1.0 (test)
Instruction Following Score2.28
18
Single-image editingGEdit EN (full)
BG Change2.73
15
Image EditingImgEdit (test)
Add Score3.58
14
Instruction-based Image EditingEmuEdit-bench (test)
CLIP-src Score0.8912
13
Showing 10 of 40 rows

Other info

Follow for update