Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

About

Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation. In this work, we propose a pixel-wise contrastive learning method called CP2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. In detail, we copy-paste a random crop from an image (the foreground) onto different background images and pretrain a semantic segmentation model with the objective of 1) distinguishing the foreground pixels from the background pixels, and 2) identifying the composed images that share the same foreground.Experiments show the strong performance of CP2 in downstream semantic segmentation: By finetuning CP2 pretrained models on PASCAL VOC 2012, we obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S.

Feng Wang, Huiyu Wang, Chen Wei, Alan Yuille, Wei Shen• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet (val)--
1206
Video Object SegmentationDAVIS 2017 (val)
J mean51.3
1130
Semantic segmentationADE20K
mIoU25.4
936
Semantic segmentationPASCAL VOC (val)
mIoU65.2
338
Semantic segmentationCOCO Stuff (val)
mIoU46.5
126
Semantic segmentationCOCO Object (val)
mIoU0.594
77
Semantic segmentationVOC 2012 (val)
mIoU63.1
67
Unsupervised Semantic SegmentationPASCAL VOC 2012 (val)
mIoU9.5
15
Unsupervised SegmentationCOCO-Things (val)
mIoU12.9
13
Unsupervised SegmentationCOCO Stuff (val)
mIoU13.6
13
Showing 10 of 10 rows

Other info

Follow for update