Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
About
Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves $77.1\%$ top-$1$ accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute $+1.5\%$; on larger ResNet models, ReLICv2 achieves up to $80.6\%$ outperforming previous self-supervised approaches with margins up to $+2.3\%$. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | PASCAL VOC 2012 (val) | Mean IoU77.9 | 2040 | |
| Image Classification | ImageNet 1k (test) | -- | 798 | |
| Image Classification | CIFAR100 (test) | Top-1 Accuracy85.3 | 377 | |
| Image Classification | Stanford Cars (test) | Accuracy92.3 | 306 | |
| Image Classification | CIFAR10 (test) | Test Accuracy97.7 | 284 | |
| Classification | CIFAR10 (test) | Accuracy90.2 | 266 | |
| Image Classification | ImageNet (test) | Top-1 Acc80.6 | 235 | |
| Image Classification | FGVC-Aircraft (test) | Accuracy88.7 | 231 | |
| Image Classification | ImageNet-Sketch (test) | Top-1 Acc0.099 | 132 | |
| Image Classification | Oxford Flowers-102 (test) | Top-1 Accuracy95.7 | 131 |