Boosting Self-Supervised Learning via Knowledge Transfer
About
In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we present a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific fine-tuned model. This allows us to: 1) quantitatively assess previously incompatible models including handcrafted features; 2) show that deeper neural network models can learn better representations from the same pretext task; 3) transfer knowledge learned with a deep model to a shallower one and thus boost its learning. We use this framework to design a novel self-supervised task, which achieves state-of-the-art performance on the common benchmarks in PASCAL VOC 2007, ILSVRC12 and Places by a significant margin. Our learned features shrink the mAP gap between models trained via self-supervised learning and supervised learning from 5.9% to 2.6% in object detection on PASCAL VOC 2007.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | PASCAL VOC 2012 (val) | Mean IoU42.6 | 2040 | |
| Object Detection | PASCAL VOC 2007 (test) | mAP56.5 | 821 | |
| Domain Generalization | PACS (test) | Average Accuracy59.57 | 225 | |
| Classification | PASCAL VOC 2007 (test) | mAP (%)72.5 | 217 | |
| Image Classification | Places | -- | 72 | |
| Image Classification | ILSVRC 12 | Top-1 Acc19.2 | 31 | |
| Linear Classification | ImageNet official (val) | Accuracy37.3 | 19 | |
| Linear Classification | Places205 official (val) | Accuracy37.5 | 14 |