LEDITS++: Limitless Image Editing using Text-to-Image Models
About
Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming finetuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | PIE-Bench | PSNR24.67 | 116 | |
| Image Editing | EditEval v2 | LPIPS0.3554 | 14 | |
| Image Editing | 1024 x 1024 resolution | Runtime (4090, s)33.19 | 14 | |
| Instructional Image Editing | OmniEdit 1.0 (test) | Swap-0.81 | 13 | |
| Image Editing | MagicBrush Single-Turn | L1 Loss0.094 | 11 | |
| Affective Image Stylization | EmoEdit (inference) | CLIP Score0.687 | 11 | |
| Object Replacement and Style Blending | Object Replacement and Style Blending (800 pairs) (test) | BOSM0.2693 | 11 | |
| Object Replacement and Object Blending | Unsplash 4,000 samples (test) | BOM0.3913 | 10 | |
| Sad Facial Attribute Editing | CelebA-HQ (test) | Sdir0.169 | 8 | |
| Smiling Facial Attribute Editing | CelebA-HQ (test) | Sdir0.182 | 8 |