MeshLoc: Mesh-Based Visual Localization

About

Visual localization, i.e., the problem of camera pose estimation, is a central component of applications such as autonomous robots and augmented reality systems. A dominant approach in the literature, shown to scale to large scenes and to handle complex illumination and seasonal changes, is based on local features extracted from images. The scene representation is a sparse Structure-from-Motion point cloud that is tied to a specific local feature. Switching to another feature type requires an expensive feature matching step between the database images used to construct the point cloud. In this work, we thus explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation. We show that this approach can achieve state-of-the-art results. We further show that surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage, and even when rendering raw scene geometry without color or texture. Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.

Vojtech Panek, Zuzana Kukelova, Torsten Sattler• 2022

Related benchmarks

Task	Dataset	Result
Visual Localization	Aachen Day-Night v1.1 (Day)	SR (0.25m, 2°)84.2	70
Visual Localization	Aachen Day-Night v1.1 (Night)	Success Rate (0.25m, 2°)70.2	69
Sequential Visual Localization	ScanNet cross-temporal v2	Recall @ 0.25m97	18
Sequential Visual Localization	3RScan v1 (test)	Recall @ 0.25m distance69	18
Visual Localization	3RScan	Average Storage per Scene (MB)701.3	6
Visual Localization	ScanNet	Average Storage per Scene (MB)1.06e+4	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord