Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VRP-SAM: SAM with Visual Reference Prompt

About

In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images, including \textbf{point}, \textbf{box}, \textbf{scribble}, and \textbf{mask}. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM demonstrates strong generalization capabilities, allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at https://github.com/syp2ysy/VRP-SAM

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Xiaofan Li, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li• 2024

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean62.1
1130
Few-shot Semantic SegmentationCOCO-20i
mIoU53.9
115
Few-shot Semantic SegmentationPASCAL-5i
mIoU71.9
96
Semantic segmentationCOCO-20i (test)
Mean Score60.4
70
Semantic segmentationPASCAL 1-shot 5i
mIoU (fold1)78.3
57
Semantic segmentationCOCO 20i 1-shot
Fold 0 Score48.1
41
Few-shot Semantic SegmentationCOCO-20i binary
mIoU53.9
14
Face SegmentationAuthors' Face Occlusion Dataset (test)
Occlusion IoU47.4
13
Part SegmentationPASCAL-Part
mIoU36.2
10
One-shot semantic segmentationCOCO-20i (novel)
F-Score (Fold 0)48.1
9
Showing 10 of 11 rows

Other info

Code

Follow for update