Representation Learning by Learning to Count

About

We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.

Mehdi Noroozi, Hamed Pirsiavash, Paolo Favaro• 2017

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Semantic segmentation	PASCAL VOC 2012 (val)	Mean IoU36.6	2204
Image Classification	ImageNet-1k (val)	Top-1 Accuracy34.3	1498
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU36.6	1477
Image Classification	CIFAR-10 (test)	Accuracy50.9	906
Object Detection	PASCAL VOC 2007 (test)	mAP51.4	844
Image Classification	SVHN (test)	Accuracy63.4	470
Semantic segmentation	Pascal VOC	mIoU0.366	280
Classification	PASCAL VOC 2007 (test)	mAP (%)67.7	217
Scene Classification	Places 205 categories (test)	Top-1 Acc0.363	150

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord