EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
About
The task of Visual Place Recognition (VPR) is to predict the location of a query image from a database of geo-tagged images. Recent studies in VPR have highlighted the significant advantage of employing pre-trained foundation models like DINOv2 for the VPR task. However, these models are often deemed inadequate for VPR without further fine-tuning on VPR-specific data. In this paper, we present an effective approach to harness the potential of a foundation model for VPR. We show that features extracted from self-attention layers can act as a powerful re-ranker for VPR, even in a zero-shot setting. Our method not only outperforms previous zero-shot approaches but also introduces results competitive with several supervised methods. We then show that a single-stage approach utilizing internal ViT layers for pooling can produce global features that achieve state-of-the-art performance, with impressive feature compactness down to 128D. Moreover, integrating our local foundation features for re-ranking further widens this performance gap. Our method also demonstrates exceptional robustness and generalization, setting new state-of-the-art performance, while handling challenging conditions such as occlusion, day-night transitions, and seasonal variations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Place Recognition | MSLS (val) | Recall@192.8 | 236 | |
| Visual Place Recognition | Pitts30k | Recall@193.9 | 164 | |
| Visual Place Recognition | Tokyo24/7 | Recall@198.7 | 146 | |
| Visual Place Recognition | MSLS Challenge | Recall@179 | 134 | |
| Visual Place Recognition | Nordland | Recall@195 | 112 | |
| Visual Place Recognition | SPED | Recall@193.1 | 106 | |
| Visual Place Recognition | Pittsburgh30k (test) | Recall@194.8 | 86 | |
| Visual Place Recognition | AmsterTime | Recall@165.5 | 83 | |
| Visual Place Recognition | St Lucia | R@1100 | 76 | |
| Visual Place Recognition | Nordland | Recall@195 | 72 |