Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DetCo: Unsupervised Contrastive Learning for Object Detection

About

Unsupervised contrastive learning achieves great success in learning image representations with CNN. Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection. DetCo has several appealing benefits. (1) It is carefully designed by investigating the weaknesses of current self-supervised methods, which discard important representations for object detection. (2) DetCo builds hierarchical intermediate contrastive losses between global image and local patches to improve object detection, while maintaining global representations for image recognition. Theoretical analysis shows that the local patches actually remove the contextual information of an image, improving the lower bound of mutual information for better contrastive learning. (3) Extensive experiments on PASCAL VOC, COCO and Cityscapes demonstrate that DetCo not only outperforms state-of-the-art methods on object detection, but also on segmentation, pose estimation, and 3D shape prediction, while it is still competitive on image classification. For example, on PASCAL VOC, DetCo-100ep achieves 57.4 mAP, which is on par with the result of MoCov2-800ep. Moreover, DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule. Code will be released at \href{https://github.com/xieenze/DetCo}{\color{blue}{\tt github.com/xieenze/DetCo}}.

Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo• 2021

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU37.3
2731
Object DetectionCOCO 2017 (val)
AP40.1
2454
Image ClassificationImageNet-1k (val)
Top-1 Accuracy68.6
1453
Instance SegmentationCOCO 2017 (val)--
1144
Video Object SegmentationDAVIS 2017 (val)
J mean57
1130
Semantic segmentationADE20K
mIoU37.8
936
Semantic segmentationCityscapes
mIoU76.5
578
Instance SegmentationCOCO
APmask36.4
279
Object DetectionCOCO
AP50 (Box)61
190
Semantic segmentationPascal VOC
mIoU0.726
172
Showing 10 of 16 rows

Other info

Follow for update