Visual Instruction Inversion: Image Editing via Visual Prompting
About
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dehazing | SOTS | -- | 154 | |
| Super-Resolution | FFHQ 1k | FID110.4 | 23 | |
| Image Deblurring | FFHQ 1k | FID122.6 | 16 | |
| Image Colorization | DIV2K | FID298.1 | 16 | |
| Image Denoising | BSD400 (test) | FID248.8 | 16 | |
| Image Deraining | Rain100L | FID203.8 | 13 | |
| Surface Normal Estimation | Bedroom Images In-domain | L1 Error0.2081 | 11 | |
| Intrinsic Image Decomposition | Bedroom Images In-domain | Albedo MSE0.0145 | 8 | |
| Intrinsic Image Decomposition | Bedroom images Out-of-domain | Albedo MSE0.0246 | 8 | |
| Monocular Depth Estimation | Bedroom Images In-domain | REL34.98 | 8 |