Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn Consistent Image Editing

About

Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the layer-wise roles of transformers, we introduce a adaptive attention highlighting method that enhances editability while preserving multi-turn coherence. Extensive experiments demonstrate that our framework significantly improves edit success rates and visual fidelity compared to existing methods.

Zijun Zhou, Yingying Deng, Xiangyu He, Weiming Dong, Fan Tang• 2025

Related benchmarks

TaskDatasetResultRank
Text-based Image EditingComplex-PIE-Bench
CLIP-T25.03
7
Text-based Image EditingPIE-Bench++
CLIP-T23.87
7
Text-Guided Image EditingUser Study
SA3.11
6
Showing 3 of 3 rows

Other info

Follow for update