Visual Localization via Few-Shot Scene Region Classification

About

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes. Code available at: \url{https://github.com/siyandong/SRC}

Siyan Dong, Shuzhe Wang, Yixin Zhuang, Juho Kannala, Marc Pollefeys, Baoquan Chen• 2022

Related benchmarks

Task	Dataset	Result
Visual Localization	Cambridge Landmarks	King's Positional Error (cm)39	48
Visual Relocalization	Cambridge Landmarks	Position Error (King's, cm)39	14
Indoor Relocalization	7Scenes D-SLAM poses	Success Rate (5cm/5deg)55.2	11
Indoor Relocalization	7Scenes SfM poses	Success Rate (5cm, 5°)81.1	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord