Blended Diffusion for Text-driven Editing of Natural Images

About

Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation. Code is available at: https://omriavrahami.com/blended-diffusion-page/

Omri Avrahami, Dani Lischinski, Ohad Fried• 2021

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256 (val)	Inception Score (IS)39.8	493
Multi-object recognition	COCO (val)	Exclusive mAP24.87	17
object recognition	Pascal (test)	Exclusive mAP57.51	17
3D Reconstruction	Toys4k (Preserved Part)	Appearance PSNR19.56	14
3D Inpainting	Toys4k Inpainting Part	CLIP Score30.17	14
Illumination-preserving image editing	16 concepts under seven illuminants 1.0 (test)	Angular Error12.91	12
Object Replacement and Style Blending	Object Replacement and Style Blending (800 pairs) (test)	BOSM0.4903	11
Image Editing	MagicBrush Single-Turn	L1 Loss3.5631	11
Object Replacement and Object Blending	Unsplash 4,000 samples (test)	BOM0.7241	10
Text-guided object inpainting	OpenImages	Local FID21.93	10

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord