Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

About

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.

Philipp Lindenberger, Paul-Edouard Sarlin, Viktor Larsson, Marc Pollefeys• 2021

Related benchmarks

Task	Dataset	Result
Camera pose estimation	CO3D v2	AUC@3030.1	117
Pose Estimation	KITTI odometry	AUC584.34	51
Camera pose estimation	RealEstate10K	--	46
Pose Estimation	ScanNet	AUC @ 5 deg21.25	41
3D Triangulation	ETH3D (train)	Accuracy (1cm)79.01	33
Multi-view pose regression	CO3D v2	RRA@1533.7	31
Camera pose estimation	IMC	AUC (3° Threshold)0.4519	20
Structure-from-Motion	IMC 2021	AUC (3° Threshold)46.3	17
Multi-View Camera Pose Estimation	ETH3D	AUC@1°0.5435	16
Multi-View Camera Pose Estimation	IMC Dataset	AUC @ 3°45.19	16

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord