DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

About

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing remains challenging. In this paper, we propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing: (1) in complex scenarios, editing results often lack editing accuracy and exhibit unexpected artifacts; (2) lack of flexibility to harmonize editing operations, e.g., imagine new content. In our solution, we introduce image prompts in fine-grained image editing, cooperating with the text prompt to better describe the editing content. To increase the flexibility while maintaining content consistency, we locally combine stochastic differential equation (SDE) into the ordinary differential equation (ODE) sampling. In addition, we incorporate regional score-based gradient guidance and a time travel strategy into the diffusion sampling, further improving the editing quality. Extensive experiments demonstrate that our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks, including editing within a single image (e.g., object moving, resizing, and content dragging) and across images (e.g., appearance replacing and object pasting). Our source code is released at https://github.com/MC-E/DragonDiffusion.

Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang• 2024

Related benchmarks

Task	Dataset	Result
Image Editing	1024 x 1024 resolution	--	14
Drag-style image editing	PRD (Paired Region Dataset) benchmark 1.0 (test)	MSE0.0959	9
Drag-based Image Editing	DragBench-DR 33	Mean Distance (MD)25.85	8
Drag-based Image Editing	DragBench-SR 26	MD25.77	8
Drag-based Image Editing	DragBench-SR and DragBench-DR User Study images	CP Win53	7
3D Part-level Drag-based Generation	PartDrag-4D (evaluation)	PSNR22.52	7
Overall Appearance Transfer Quality	Curated dataset 100 image pairs	DeQA3.1191	6
Material Transfer	Curated dataset 100 image pairs	CLIP-T Score0.2211	6
Semantic-Aware Appearance Transfer	Curated dataset 100 image pairs	CLIP-I79.1	6
3D Part-level Drag-based Generation	Objaverse Animation-HQ (evaluation)	PSNR19.46	5

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord