MultiLoc: Multi-view Guided Relative Pose Regression for Fast and Robust Visual Re-Localization
About
Relative Pose Regression (RPR) generalizes well to unseen environments, but its performance is often limited due to pairwise and local spatial views. To this end, we propose MultiLoc, a novel multi-view guided RPR model trained at scale, equipping relative pose regression with globally consistent spatial and geometric understanding. Specifically, our method jointly fuses multiple reference views and their associated camera poses in a single forward pass, enabling accurate zero-shot pose estimation with real-time efficiency. To reliably supply informative context, we further propose a co-visibility-driven retrieval strategy for geometrically relevant reference view selection. MultiLoc establishes a new benchmark in visual re-localization, consistently outperforming existing state-of-the-art (SOTA) relative pose regression (RPR) methods across diverse datasets, including WaySpots, Cambridge Landmarks, and Indoor6. Furthermore, MultiLoc's pose regressor exhibits SOTA performance in relative pose estimation, surpassing RPR, feature matching and non-regression-based techniques on the MegaDepth-1500, ScanNet-1500, and ACID benchmarks. These results demonstrate robust domain generalization of MultiLoc across indoor, outdoor and natural environments. Code will be made publicly available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Relative Pose Estimation | MegaDepth 1500 | AUC @ 20°89.34 | 151 | |
| Camera pose estimation | ACID | AUC @ 5°0.7224 | 30 | |
| Relative Camera Pose Evaluation | ScanNet1500 | AUC@555.63 | 23 | |
| Visual Re-localization | Cambridge Landmarks | Average Positional Error (4 Scenes)0.27 | 11 | |
| Visual Re-localization | Wayspots (test) | -- | 6 | |
| Visual Re-localization | Indoor6 | Scene 1 Translation Error (cm)2.6 | 3 |