Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Visual Instruction Inversion: Image Editing via Visual Prompting

About

Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.

Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee• 2023

Related benchmarks

TaskDatasetResultRank
Surface Normal EstimationBedroom Images In-domain
L1 Error0.2081
11
Intrinsic Image DecompositionBedroom Images In-domain
Albedo MSE0.0145
8
Intrinsic Image DecompositionBedroom images Out-of-domain
Albedo MSE0.0246
8
Monocular Depth EstimationBedroom Images In-domain
REL34.98
8
Monocular Depth EstimationGeneralization Images Out-of-domain
Relative Error (REL)0.5364
8
Surface Normal EstimationGeneralization Images Out-of-domain
L1 Error0.2448
8
Image ManipulationImage manipulation Few-shot (In Distribution)
CLIP-Dir15.85
7
Semantic segmentationBedroom dataset
Bed Accuracy0.6
7
Image Analogy GenerationInstructPix2Pix (test)
CLIP Directional Score0.1007
6
Image ManipulationFew-shot image manipulation (Out of Distribution)
CLIP Directional Score14.69
6
Showing 10 of 17 rows

Other info

Follow for update