Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Prompting via Image Inpainting

About

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc.

Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL-5^i Fold-0
mIoU28.66
75
Semantic segmentationPASCAL-5^i Fold-1
mIoU30.21
75
Semantic segmentationPASCAL-5^i Fold-2
mIoU27.81
75
Semantic segmentationPASCAL-5^i Fold-3
mIoU23.55
75
3D Pose EstimationHuman3.6M
MPJPE (mm)351
66
Few-shot SegmentationFSS-1000 (test)
mIoU58.3
50
Few-shot SegmentationPASCAL-5i--
46
Single Object DetectionPASCAL VOC 2012
mIoU25.45
37
Foreground segmentationPascal-5i (3)
mIoU26.15
25
Foreground segmentationPascal-5i Fold-0 (test)
mIoU28.66
25
Showing 10 of 56 rows

Other info

Code

Follow for update