Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Camera Localization via Dense Scene Matching

About

Camera localization aims to estimate 6 DoF camera poses from RGB images. Traditional methods detect and match interest points between a query image and a pre-built 3D model. Recent learning-based approaches encode scene structures into a specific convolutional neural network (CNN) and thus are able to predict dense coordinates from RGB images. However, most of them require re-training or re-adaption for a new scene and have difficulties in handling large-scale scenes due to limited network capacity. We present a new method for scene agnostic camera localization using dense scene matching (DSM), where a cost volume is constructed between a query image and a scene. The cost volume and the corresponding coordinates are processed by a CNN to predict dense coordinates. Camera poses can then be solved by PnP algorithms. In addition, our method can be extended to temporal domain, which leads to extra performance boost during testing time. Our scene-agnostic approach achieves comparable accuracy as the existing scene-specific approaches, such as KFNet, on the 7scenes and Cambridge benchmark. This approach also remarkably outperforms state-of-the-art scene-agnostic dense coordinate regression network SANet. The Code is available at https://github.com/Tangshitao/Dense-Scene-Matching.

Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, Ping Tan• 2021

Related benchmarks

TaskDatasetResultRank
Camera Localization7 Scenes--
46
Visual Localization7scenes indoor
Positional Error (Chess, cm)2
30
Visual LocalizationCambridge Landmarks Church
Median Translation Error (m)0.34
23
Visual LocalizationCambridge Landmarks College
Median Translation Error (m)0.35
23
Camera LocalizationCambridge Landmarks outdoor
King's College Rotation Error (°)0.35
20
Visual LocalizationCambridge Landmarks OldHospital
Median Translation Error (m)0.23
9
Visual LocalizationCambridge Landmarks ShopFacade
Median Translation Error0.3
9
Visual LocalizationCambridge Landmarks Court
Median Translation Error (m)0.43
6
Scene Coordinate Regression7 Scenes
Run time (s)0.21
2
Showing 9 of 9 rows

Other info

Code

Follow for update