Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Null-text Inversion for Editing Real Images using Guided Diffusion Models

About

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or• 2022

Related benchmarks

TaskDatasetResultRank
Image EditingPIE-Bench
PSNR27.03
166
DehazingSOTS--
154
Image ReconstructionCOCO 2017 (val)
PSNR26.61
123
Subject-driven image generationDreamBench
DINO Score56.9
100
Instructive image editingEMU Edit (test)
CLIP Image Similarity0.761
55
Image EditingPIE-Bench (test)--
55
Instructive image editingMagicBrush (test)
CLIP Image0.752
37
Image EditingUser Study 100 images (test)
User Selection Rate65.1
32
Image EditingAnyEdit (test)
CLIP Score (Input)0.773
28
DehazingRESIDE
FID39.94
25
Showing 10 of 67 rows

Other info

Code

Follow for update