Divide and Contrast: Self-supervised Learning from Uncurated Data
About
Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet. We explore the effects of contrastive learning from larger, less-curated image datasets such as YFCC, and find there is indeed a large difference in the resulting representation quality. We hypothesize that this curation gap is due to a shift in the distribution of image classes -- which is more diverse and heavy-tailed -- resulting in less relevant negative samples to learn from. We test this hypothesis with a new approach, Divide and Contrast (DnC), which alternates between contrastive learning and clustering-based hard negative mining. When pretrained on less curated datasets, DnC greatly improves the performance of self-supervised learning on downstream tasks, while remaining competitive with the current state-of-the-art on curated datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU39.2 | 2731 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy75.8 | 1453 | |
| Video Object Segmentation | DAVIS 2017 (val) | J mean63.1 | 1130 | |
| Semantic segmentation | ADE20K | mIoU39.2 | 936 | |
| Object Detection | COCO (val) | mAP43.9 | 613 | |
| Action Recognition | UCF101 (test) | -- | 307 | |
| Image Classification | Stanford Cars (test) | Accuracy75.3 | 306 | |
| Instance Segmentation | COCO | APmask37.2 | 279 | |
| Classification | CIFAR10 (test) | Accuracy91.7 | 266 | |
| Image Classification | ImageNet (test) | Top-1 Acc70.7 | 235 |