Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
About
The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset. However, the general public benchmarks only provide noisy GPS labels associated with the training images, which act as weak supervisions for learning image-to-image similarities. Such label noise prevents deep neural networks from learning discriminative features for accurate localization. To tackle this challenge, we propose to self-supervise image-to-region similarities in order to fully explore the potential of difficult positive images alongside their sub-regions. The estimated image-to-region similarities can serve as extra training supervision for improving the network in generations, which could in turn gradually refine the fine-grained similarities to achieve optimal performance. Our proposed self-enhanced image-to-region similarity labels effectively deal with the training bottleneck in the state-of-the-art pipelines without any additional parameters or manual annotations in both training and inference. Our method outperforms state-of-the-arts on the standard localization benchmarks by noticeable margins and shows excellent generalization capability on multiple image retrieval datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Place Recognition | MSLS (val) | Recall@170 | 236 | |
| Visual Place Recognition | Pitts30k | Recall@189.4 | 164 | |
| Visual Place Recognition | Tokyo24/7 | Recall@181 | 146 | |
| Visual Place Recognition | MSLS Challenge | Recall@141.6 | 134 | |
| Visual Place Recognition | Nordland | Recall@116.1 | 112 | |
| Visual Place Recognition | SPED | Recall@180.2 | 106 | |
| Visual Place Recognition | Pittsburgh30k (test) | Recall@189.4 | 86 | |
| Visual Place Recognition | Pitts250k | Recall@190.4 | 84 | |
| Visual Place Recognition | AmsterTime | Recall@129.7 | 83 | |
| Visual Place Recognition | St Lucia | R@175.9 | 76 |