Graph-RISE: Graph-Regularized Image Semantic Embedding
About
Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Captioning | Conceptual Captions (dev) | CIDEr86.8 | 9 | |
| Image Semantic Embedding | PIT (internal evaluation) | Triplet Accuracy87.16 | 5 | |
| Image Semantic Embedding | GIT (internal evaluation) | Triplet Accuracy89.53 | 5 | |
| kNN search accuracy | ImageNet | Top-1 Accuracy68.29 | 5 | |
| kNN search accuracy | iNaturalist | Top-1 Accuracy31.12 | 5 |