Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

What Should Not Be Contrastive in Contrastive Learning

About

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations. However, these methods implicitly assume a particular set of representational invariances (e.g., invariance to color), and can perform poorly when a downstream task violates this assumption (e.g., distinguishing red vs. yellow cars). We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances. Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces, each of which is invariant to all but one augmentation. We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks. We further find that the concatenation of the invariant and varying spaces performs best across all tasks we investigate, including coarse-grained, fine-grained, and few-shot downstream classification tasks, and various data corruptions.

Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell• 2020

Related benchmarks

TaskDatasetResultRank
Fine-grained visual classificationNABirds (test)
Top-1 Accuracy74.45
157
Image ClassificationImageNet-100--
84
Fine-grained Image ClassificationCUB-200 (test)
Accuracy68.71
45
Fine-grained Image ClassificationCUB
Top-1 Acc66.42
22
Fine-grained Image ClassificationNABirds--
22
Fine-grained Image ClassificationCars
Top-1 Acc75.69
20
Fine grained classificationCars (test)
Accuracy85.9
13
Fine-grained Image ClassificationAircrafts (test)
Top-1 Acc0.8175
11
Fine-grained Image ClassificationAircrafts
Top-1 Accuracy72.49
10
Showing 9 of 9 rows

Other info

Follow for update