From Coarse to Fine: Robust Hierarchical Localization at Large Scale
About
Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Localization | Aachen Day-Night v1.1 (Day) | SR (0.25m, 2°)88.1 | 70 | |
| Visual Localization | Aachen Day-Night v1.1 (Night) | Success Rate (0.25m, 2°)73.3 | 69 | |
| Visual Localization | 7Scenes (test) | Chess Median Angular Error (°)0.84 | 61 | |
| Visual Localization | Cambridge Landmarks OldHospital | Median Translation Error (m)0.3 | 51 | |
| Camera Localization | 7 Scenes | -- | 46 | |
| Visual Localization | RobotCar Seasons (night) | Recall (0.25m, 2°)33.3 | 35 | |
| Visual Localization | Cambridge Landmarks College | Median Translation Error (m)0.12 | 35 | |
| Visual Localization | Cambridge Landmarks Church | Median Translation Error (m)0.07 | 35 | |
| Visual Localization | Cambridge Landmarks (test) | Avg Median Positional Error (m)0.356 | 35 | |
| Visual Localization | Extended CMU Seasons Urban | Recall @ (0.25m, 2°)95.5 | 34 |