Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

About

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation. Code is available at: https://git.io/AdelaiDet

Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li• 2020

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU37.2
2731
Object DetectionCOCO 2017 (val)
AP40.3
2454
Semantic segmentationPASCAL VOC 2012 (val)
Mean IoU71.6
2040
Image ClassificationImageNet-1k (val)
Top-1 Accuracy63.6
1453
Semantic segmentationPASCAL VOC 2012 (test)
mIoU69.4
1342
Instance SegmentationCOCO 2017 (val)
APm0.357
1144
Video Object SegmentationDAVIS 2017 (val)
J mean60.6
1130
Semantic segmentationADE20K
mIoU38.1
936
Image ClassificationImageNet-1k (val)
Top-1 Accuracy63.6
840
Object DetectionCOCO (val)
mAP37
613
Showing 10 of 63 rows

Other info

Follow for update