Cut and Learn for Unsupervised Object Detection and Instance Segmentation

About

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image and then learns a detector on these masks using our robust loss function. We further improve the performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP50 by over 2.7 times on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% APbox and 6.6% APmask on COCO when training with 5% labels.

Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU35.7	3069
Object Detection	COCO 2017 (val)	AP5.9	2843
Instance Segmentation	COCO 2017 (val)	--	1275
Semantic segmentation	ADE20K	mIoU35.7	1028
Semantic segmentation	Cityscapes	mIoU18.7	668
Video Instance Segmentation	YouTube-VIS 2019 (val)	AP16	604
Semantic segmentation	Cityscapes (val)	mIoU18.7	572
Instance Segmentation	COCO (val)	APmk8.71	485
Semantic segmentation	PASCAL VOC (val)	mIoU53.8	380
Semantic segmentation	PASCAL Context (val)	mIoU43.4	360

Showing 10 of 77 rows

...

Other info

Code

Follow for update

@wizwand_team Discord