VRP-SAM: SAM with Visual Reference Prompt

About

In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images, including \textbf{point}, \textbf{box}, \textbf{scribble}, and \textbf{mask}. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM demonstrates strong generalization capabilities, allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at https://github.com/syp2ysy/VRP-SAM

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Xiaofan Li, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li• 2024

Related benchmarks

Task	Dataset	Result
Video Object Segmentation	DAVIS 2017 (val)	J mean62.1	1226
Few-shot Semantic Segmentation	COCO-20i	mIoU55.5	178
Few-shot Semantic Segmentation	PASCAL-5i	mIoU71.9	96
Few-shot Semantic Segmentation	Pascal-5^i	Mean Score71.8	76
Semantic segmentation	COCO-20i (test)	Mean Score60.4	70
Semantic segmentation	PASCAL 1-shot 5i	mIoU (fold1)78.3	57
Few-shot Semantic Segmentation	PASCAL-5^i 1-shot	mIoU71.9	53
Semantic segmentation	COCO 20i 1-shot	Fold 0 Score48.1	41
Part Segmentation	PASCAL-Part	mIoU36.2	22
Few-shot Semantic Segmentation	COCO-20i binary	mIoU53.9	14

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord