From Coarse to Fine: Robust Hierarchical Localization at Large Scale
About
Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Localization | Aachen Day-Night v1.1 (Night) | Success Rate (0.25m, 2°)73.3 | 58 | |
| Visual Localization | Aachen Day-Night v1.1 (Day) | SR (0.25m, 2°)88.1 | 50 | |
| Camera Localization | 7 Scenes | -- | 46 | |
| Visual Localization | 7Scenes (test) | Chess Median Angular Error (°)0.84 | 41 | |
| Visual Localization | RobotCar Seasons (night) | Recall (0.25m, 2°)33.3 | 35 | |
| Visual Localization | Cambridge Landmarks (test) | Avg Median Positional Error (m)0.356 | 35 | |
| Visual Localization | Extended CMU Seasons Urban | Recall @ (0.25m, 2°)95.5 | 34 | |
| Camera Relocalization | 7-Scenes (test) | Median Translation Error (cm)3 | 30 | |
| Visual Localization | 7scenes indoor | Positional Error (Chess, cm)2 | 30 | |
| Visual Localization | Cambridge Landmarks | King's Positional Error (cm)11 | 28 |