DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
About
Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tackle the retrieval task. In this paper, we abandon the two-stage paradigm and seek to design an effective single-stage solution by integrating local and global information inside images into compact image representations. Specifically, we propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval. It attentively extracts representative local information with multi-atrous convolutions and self-attention at first. Components orthogonal to the global image representation are then extracted from the local information. At last, the orthogonal components are concatenated with the global representation as a complementary, and then aggregation is performed to generate the final representation. The whole framework is end-to-end differentiable and can be trained with image-level labels. Extensive experimental results validate the effectiveness of our solution and show that our model achieves state-of-the-art image retrieval performances on Revisited Oxford and Paris datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Retrieval | Revisited Oxford (ROxf) (Medium) | mAP82.4 | 124 | |
| Image Retrieval | Revisited Paris (RPar) (Hard) | mAP80.3 | 115 | |
| Image Retrieval | Revisited Paris (RPar) (Medium) | mAP91 | 100 | |
| Image Retrieval | Revisited Oxford (ROxf) + R1M (Medium) | mAP77.4 | 95 | |
| Image Retrieval | Revisited Oxford (ROxf) + R1M (Hard) | mAP54.8 | 83 | |
| Image Retrieval | Revisited Paris (RPar) + R1M (Hard) | mAP66.7 | 82 | |
| Image Retrieval | Revisited Oxford (ROxf) (Hard) | mAP61.1 | 81 | |
| Image Retrieval | Revisited Paris (RPar) + R1M (Medium) | mAP83.3 | 74 | |
| Image Retrieval | Tokyo 24/7 (test) | mAP75.4 | 34 | |
| Global Localization | NCLT | Recall@154.5 | 10 |