Self-Supervised Learning of Pretext-Invariant Representations

About

The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations for a large training set of images. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as "pearl") that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new state-of-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised learning of image representations with good invariance properties.

Ishan Misra, Laurens van der Maaten• 2019

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP37.5	2843
Image Classification	ImageNet-1k (val)	Top-1 Accuracy67.4	1498
Instance Segmentation	COCO 2017 (val)	--	1275
Image Classification	ImageNet (val)	Top-1 Acc67.4	1206
Image Classification	ImageNet-1k (val)	Top-1 Accuracy63.6	920
Image Classification	ImageNet 1k (test)	Top-1 Accuracy63.6	880
Object Detection	PASCAL VOC 2007 (test)	mAP73.4	844
Image Classification	ImageNet-1k (val)	Top-1 Acc63.6	706
Image Classification	ImageNet-1K	Top-1 Acc63.6	600
Image Classification	ImageNet ILSVRC-2012 (val)	Top-1 Accuracy63.6	441

Showing 10 of 70 rows

Other info

Follow for update

@wizwand_team Discord