TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
About
Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We trace the artifacts to mismatched noise statistics between inverted noises and the expected noise schedule, and suggest a shifted noise schedule which corrects for this offset. To increase editing strength, we propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts. All in all, our method enables text-based image editing with as few as three diffusion steps, while providing novel insights into the mechanisms behind popular text-based editing approaches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | PIE-Bench | PSNR22.51 | 116 | |
| Image Editing | PIE-Bench (test) | PSNR22.43 | 46 | |
| Image Editing | PIE-Bench 1.0 (test) | PSNR22.43 | 22 | |
| Image Editing | PIE-Bench | Distance 10313.8 | 17 | |
| Text-Guided Image Editing | General Image Editing | Speedup19.68 | 12 | |
| Object Replacement and Style Blending | Object Replacement and Style Blending (800 pairs) (test) | BOSM0.3829 | 11 | |
| Object Replacement and Object Blending | Unsplash 4,000 samples (test) | BOM0.3199 | 10 |