LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images
About
Visual localization involves estimating a query image's 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach's SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Localization | 7Scenes Pumpkin | Median Translation Error (cm)0.7 | 25 | |
| Visual Localization | 7Scenes RedKitchen | Median Translation Error (cm)0.5 | 25 | |
| Visual Localization | 7Scenes (Office) | Median Translation Error (cm)0.7 | 25 | |
| Visual Localization | 7Scenes Chess | Median Translation Error (cm)0.4 | 25 | |
| Visual Localization | 7Scenes Fire | Median Translation Error (cm)0.6 | 25 | |
| Visual Localization | 7Scenes Stairs | Median Translation Error (cm)1.6 | 25 | |
| Visual Localization | 7Scenes Heads | Median Translation Error (cm)0.5 | 25 | |
| Relocalization | 7-Scenes Average | Median Translation Error (cm)0.76 | 18 | |
| Visual Relocalization | Cambridge Landmarks (test) | College Median Translation Error (cm)11 | 17 |