Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

About

In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.

Mehdi Noroozi, Paolo Favaro• 2016

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL VOC 2012 (val)
Mean IoU37.6
2204
Image ClassificationImageNet-1k (val)
Top-1 Accuracy45.7
1498
Semantic segmentationPASCAL VOC 2012 (test)
mIoU37.6
1477
Image ClassificationImageNet (val)
Top-1 Acc44.6
1206
Object DetectionPASCAL VOC 2007 (test)
mAP53.2
844
Image ClassificationImageNet-1K
Top-1 Acc34.6
600
Image ClassificationImageNet
Top-1 Accuracy45.7
431
Action RecognitionUCF101 (test)
Accuracy51.5
357
Action RecognitionUCF101 (mean of 3 splits)
Accuracy51.5
357
Semantic segmentationPascal VOC
mIoU0.376
280
Showing 10 of 64 rows

Other info

Follow for update