Edicho: Consistent Image Editing in the Wild
About
As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments. Edicho steps in with a training-free solution based on diffusion models, featuring a fundamental design principle of using explicit image correspondence to direct editing. Specifically, the key components include an attention manipulation module and a carefully refined classifier-free guidance (CFG) denoising strategy, both of which take into account the pre-estimated correspondence. Such an inference-time algorithm enjoys a plug-and-play nature and is compatible to most diffusion-based editing methods, such as ControlNet and BrushNet. Extensive results demonstrate the efficacy of Edicho in consistent cross-image editing under diverse settings. We will release the code to facilitate future studies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Consistent image-set generation | Curated image-set consistency benchmark (400 edits, 149 image sets) 1.0 (test) | CLIP Score0.65 | 8 | |
| Global Image Editing | GroupEditBench | CLIP Score0.292 | 4 | |
| Local Image Editing | GroupEditBench local editing | CLIP Score30.59 | 4 |