Grafit: Learning fine-grained image representations with coarse labels
About
This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy79.6 | 1469 | |
| Image Classification | Stanford Cars | Accuracy94.7 | 635 | |
| Image Classification | ImageNet-1K | Top-1 Acc79.6 | 600 | |
| Image Classification | ImageNet | Top-1 Accuracy79.6 | 431 | |
| Fine-grained Image Classification | Stanford Cars (test) | Accuracy94.7 | 348 | |
| Image Classification | Stanford Cars (test) | Accuracy92.5 | 316 | |
| Image Classification | iNaturalist 2018 | Top-1 Accuracy81.2 | 291 | |
| Image Classification | Oxford Flowers 102 | Accuracy99 | 234 | |
| Image Classification | Flowers-102 | Top-1 Acc99.1 | 198 | |
| Image Classification | Oxford Flowers-102 (test) | Top-1 Accuracy99.1 | 192 |