TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

About

Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We trace the artifacts to mismatched noise statistics between inverted noises and the expected noise schedule, and suggest a shifted noise schedule which corrects for this offset. To increase editing strength, we propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts. All in all, our method enables text-based image editing with as few as three diffusion steps, while providing novel insights into the mechanisms behind popular text-based editing approaches.

Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or• 2024

Related benchmarks

Task	Dataset	Result
Image Editing	PIE-Bench	PSNR26.04	215
Image Editing	GEdit-Bench	Semantic Consistency3.84	102
Image Editing	PIE-Bench (test)	PSNR22.43	55
Image Editing	PIE-Bench	PSNR21.44	25
Image Editing	PIE-Bench 1.0 (test)	PSNR22.43	22
Text-Guided Image Editing	PIE-Bench	Structure Distance79.87	16
Layout-free HOI editing	IEBench	Editability-Identity0.434	14
Text-Guided Image Editing	General Image Editing	Speedup19.68	12
Object Replacement and Style Blending	Object Replacement and Style Blending (800 pairs) (test)	BOSM0.3829	11
Object Replacement and Object Blending	Unsplash 4,000 samples (test)	BOM0.3199	10

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord