Null-text Inversion for Editing Real Images using Guided Diffusion Models
About
Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | PIE-Bench | PSNR27.03 | 166 | |
| Dehazing | SOTS | -- | 154 | |
| Image Reconstruction | COCO 2017 (val) | PSNR26.61 | 123 | |
| Subject-driven image generation | DreamBench | DINO Score56.9 | 100 | |
| Instructive image editing | EMU Edit (test) | CLIP Image Similarity0.761 | 55 | |
| Image Editing | PIE-Bench (test) | -- | 55 | |
| Instructive image editing | MagicBrush (test) | CLIP Image0.752 | 37 | |
| Image Editing | User Study 100 images (test) | User Selection Rate65.1 | 32 | |
| Image Editing | AnyEdit (test) | CLIP Score (Input)0.773 | 28 | |
| Dehazing | RESIDE | FID39.94 | 25 |