Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EDICT: Exact Diffusion Inversion via Coupled Transformations

About

Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.

Bram Wallace, Akash Gokul, Nikhil Naik• 2022

Related benchmarks

TaskDatasetResultRank
Image EditingPIE-Bench (test)--
46
Text-to-Image GenerationMS-COCO 5k samples Stable Diffusion v1.5 (test)
CLIP Score31.17
34
Unconditional Image GenerationCelebA-HQ 256x256
Fréchet Distance (FD)551.1
27
Image EditingPIE-Bench 1.0 (test)
PSNR29.79
22
Image ReconstructionMS-COCO 2017 (val)--
20
Image ReconstructionCOCO (val)
MSE0.0153
15
Text-guided Image-to-Image TranslationImageNet-R TI2I modified
CLIP Similarity29
10
Pure text-guided image editingCustom 200 samples (test)
CLIP-T0.327
9
Image InversionPIE-Bench
Inference Time (s)35.48
6
Content Replacement (Object/Background)PIE-Bench
NIMA5.306
5
Showing 10 of 12 rows

Other info

Code

Follow for update