Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

About

In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its variants are often compromised when handling objects in crowded and occluded scenes. In this paper, we introduce Crowd-SAM, a SAM-based framework designed to enhance SAM's performance in crowded and occluded scenes with the cost of few learnable parameters and minimal labeled images. We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net), enhancing mask selection and accuracy in crowded scenes. Despite its simplicity, Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons. Our code is available at https://github.com/FelixCaae/CrowdSAM.

Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang• 2024

Related benchmarks

Task	Dataset	Result
Object Detection	COCO (val)	--	637
Pedestrian Detection	CityPersons (val)	--	85
Object Detection	CrowdHuman (val)	AP78.4	52
Instance Segmentation	OCHuman (test)	Mask AP31.4	38
Instance Segmentation	OCHuman (val)	Mask AP31.4	25
Instance Segmentation	UCF	IoU32.83	4
Instance Segmentation	JHU	IoU30.91	4
Instance Segmentation	NWPU	IoU27.19	4
Instance Segmentation	sha	mIoU35.3	4
Instance Segmentation	SHB	mIoU21.16	4

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord