Visual Localization via Few-Shot Scene Region Classification
About
Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes. Code available at: \url{https://github.com/siyandong/SRC}
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Localization | Cambridge Landmarks | King's Positional Error (cm)39 | 28 | |
| Visual Relocalization | Cambridge Landmarks | Position Error (King's, cm)39 | 14 | |
| Indoor Relocalization | 7Scenes D-SLAM poses | Success Rate (5cm/5deg)55.2 | 11 | |
| Indoor Relocalization | 7Scenes SfM poses | Success Rate (5cm, 5°)81.1 | 9 |