Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

About

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.

Philipp Lindenberger, Paul-Edouard Sarlin, Viktor Larsson, Marc Pollefeys• 2021

Related benchmarks

TaskDatasetResultRank
Pose EstimationKITTI odometry
AUC584.34
51
Pose EstimationScanNet
AUC @ 5 deg21.25
41
Multi-view pose regressionCO3D v2
RRA@1533.7
31
Camera pose estimationCO3D v2
AUC@3030.1
29
3D TriangulationETH3D (train)
Accuracy (1cm)79.01
24
Camera pose estimationIMC
AUC (3° Threshold)0.4519
20
Structure-from-MotionIMC 2021
AUC (3° Threshold)46.3
17
Multi-View Camera Pose EstimationETH3D
AUC@1°0.5435
16
Multi-View Camera Pose EstimationIMC Dataset
AUC @ 3°45.19
16
Multi-View Camera Pose EstimationTexture-Poor SfM Dataset
AUC (Threshold 3°)20.66
16
Showing 10 of 24 rows

Other info

Follow for update