Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ZONE: Zero-Shot Instruction-Guided Local Editing

About

Recent advances in vision-language models like Stable Diffusion have shown remarkable power in creative image synthesis and editing.However, most existing text-to-image editing methods encounter two obstacles: First, the text prompt needs to be carefully crafted to achieve good results, which is not intuitive or user-friendly. Second, they are insensitive to local edits and can irreversibly affect non-edited regions, leaving obvious editing traces. To tackle these problems, we propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE. We first convert the editing intent from the user-provided instruction (e.g., "make his tie blue") into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model. We further develop an edge smoother based on FFT for seamless blending between the layer and the image.Our method allows for arbitrary manipulation of a specific region with a single instruction while preserving the rest. Extensive experiments demonstrate that our ZONE achieves remarkable local editing results and user-friendliness, outperforming state-of-the-art methods. Code is available at https://github.com/lsl001006/ZONE.

Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Instruction-guided image editingMagicBrush single-turn (test)
CLIP Similarity (Image)0.929
13
Instruction-guided image editingMagicBrush multi-turn (test)
CLIP-T0.307
7
Instruction-guided image editingZONE (test)
CLIP-T0.296
7
Image Editing100 evaluation samples (test)
L1 Loss0.0146
6
Instruction-guided image editingHuman Evaluation User Study (test)
Success Rate (SR)69.4
6
Showing 5 of 5 rows

Other info

Code

Follow for update