Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Inversion-Free Image Editing with Natural Language

About

Despite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency with accuracy; 3) the lack of compatibility with efficient consistency sampling methods used in consistency models. To address the above issues, we start by asking ourselves if the inversion process can be eliminated for editing. We show that when the initial sample is known, a special variance schedule reduces the denoising step to the same form as the multi-step consistency sampling. We name this Denoising Diffusion Consistent Model (DDCM), and note that it implies a virtual inversion strategy without explicit inversion in sampling. We further unify the attention control mechanisms in a tuning-free framework for text-guided editing. Combining them, we present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes, catering to intricate modifications without compromising on the image's integrity and explicit inversion. Through extensive experiments, InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications. Project Page: https://sled-group.github.io/InfEdit/

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, Joyce Chai• 2023

Related benchmarks

TaskDatasetResultRank
Image EditingPIE-Bench
PSNR28.51
116
Image EditingPIE-Bench (test)
PSNR27.31
46
Image EditingPIE-Bench
Distance 10317.06
17
Image EditingMagicBrush Single-Turn
L1 Loss0.122
11
Conditional GenerationControllable generation dataset ControlNet-supported 1.0
Self-sim0.117
8
Conditional GenerationControllable generation dataset New condition 1.0
Self-similarity0.102
8
Image-to-Image TranslationSummer ↔ Winter 512x512 (test)
FID75.63
7
Image-to-Image TranslationHorse ↔ Zebra 512x512 (test)
FID61.81
7
Image EditingMagicBrush Multi-Turn
L1 Loss0.155
7
Non-rigid image editingPIE benchmark
Distance6.12
6
Showing 10 of 10 rows

Other info

Code

Follow for update