Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

About

Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences. Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale. Some applications, aiming at instant augmented reality anywhere, require scale-metric pose estimates, and hence, they rely on external depth estimators to recover the scale. We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space. By learning to match 3D coordinates across images, we are able to infer the metric relative pose without depth measurements. Depth measurements are also not required for training, nor are scene reconstructions or image overlap information. MicKey is supervised only by pairs of images and their relative poses. MicKey achieves state-of-the-art performance on the Map-Free Relocalisation benchmark while requiring less supervision than competing approaches.

Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann• 2024

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	ScanNet (test)	Abs Rel0.15	30
Monocular Depth Estimation	DIODE Outdoor (test)	RMSE13.76	18
Relative Pose Estimation	Map-free dataset (test)	VCRE AUC0.75	15
Multi-Scene Graph (MSG) Construction	ARKitScenes 1.0 (test)	Recall@1100	13
Monocular Depth Estimation	DIML Outdoor (test)	Delta 1 Acc70	10
Relative Pose Estimation	Map-free	VCRE (90px) AUC0.75	10
Relative Pose Estimation	ScanNet SuperGlue (test)	VCRE AUC0.99	9
Map-free Visual Relocalization	Map-free Visual Relocalization (Official Leaderboard)	VCRE < 45px AUC57.2	5

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord