MultiGrain: a unified image embedding for classes and instances
About
MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted copies. Our joint training is simple: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution. When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1 accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute improvement over the AutoAugment method. When compared with the cosine similarity, the same embeddings perform on par with the state-of-the-art for image retrieval at moderate resolutions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet (val) | Top-1 Acc83.6 | 1206 | |
| Image Classification | ImageNet 2012 (val) | Top-1 Accuracy83.1 | 202 | |
| Image Retrieval | Holidays | mAP87.9 | 115 | |
| Instance-level search | ROxford (test) | mAP32.9 | 36 | |
| Copy detection | INRIA Copydays strong 10k YFCC100M distractors | mAP82.5 | 25 | |
| Image Copy Detection | DISC 2021 (val) | µAP20.5 | 14 | |
| Image Retrieval | UKB | Score (top-4)3.91 | 12 | |
| Instance Search | Holidays (val) | mAP92.5 | 10 | |
| Instance Search | CD10k | mAP82.5 | 5 |