Grafit: Learning fine-grained image representations with coarse labels

About

This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.

Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Herv\'e J\'egou• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1k (val)	Top-1 Accuracy79.6	1498
Image Classification	Stanford Cars	Accuracy94.7	705
Image Classification	ImageNet-1K	Top-1 Acc79.6	600
Image Classification	ImageNet	Top-1 Accuracy79.6	431
Fine-grained Image Classification	Stanford Cars (test)	Accuracy94.7	372
Image Classification	Stanford Cars (test)	Accuracy92.5	320
Image Classification	iNaturalist 2018	Top-1 Accuracy81.2	291
Image Classification	Oxford Flowers 102	Accuracy99	244
Image Classification	Oxford Flowers-102 (test)	Top-1 Accuracy99.1	221
Image Classification	Flowers-102	Top-1 Acc99.1	198

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord