Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Blended Diffusion for Text-driven Editing of Natural Images

About

Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation. Code is available at: https://omriavrahami.com/blended-diffusion-page/

Omri Avrahami, Dani Lischinski, Ohad Fried• 2021

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256 (val)--
293
Illumination-preserving image editing16 concepts under seven illuminants 1.0 (test)
Angular Error12.91
12
Object Replacement and Style BlendingObject Replacement and Style Blending (800 pairs) (test)
BOSM0.4903
11
Image EditingMagicBrush Single-Turn
L1 Loss3.5631
11
Object Replacement and Object BlendingUnsplash 4,000 samples (test)
BOM0.7241
10
Text-guided object inpaintingOpenImages
Local FID21.93
10
Description-guided Image EditingMagicBrush multi-turn (test)
L1 Loss14.5439
10
Exemplar-based Image EditingExemplar-based Image Editing User Study (test)
Quality3.93
5
Exemplar-based Image EditingCOCOEE
FID4.6
5
Text-guided image inpaintingMS-COCO
NIMA Score5.198
5
Showing 10 of 13 rows

Other info

Code

Follow for update