MESA: Matching Everything by Segmenting Anything

About

Feature matching is a crucial task in the field of computer vision, which involves finding correspondences between images. Previous studies achieve remarkable performance using learning-based feature comparison. However, the pervasive presence of matching redundancy between images gives rise to unnecessary and error-prone computations in these methods, imposing limitations on their accuracy. To address this issue, we propose MESA, a novel approach to establish precise area (or region) matches for efficient matching redundancy reduction. MESA first leverages the advanced image understanding capability of SAM, a state-of-the-art foundation model for image segmentation, to obtain image areas with implicit semantic. Then, a multi-relational graph is proposed to model the spatial structure of these areas and construct their scale hierarchy. Based on graphical models derived from the graph, the area matching is reformulated as an energy minimization task and effectively resolved. Extensive experiments demonstrate that MESA yields substantial precision improvement for multiple point matchers in indoor and outdoor downstream tasks, e.g. +13.61% for DKM in indoor pose estimation.

Yesheng Zhang, Xu Zhao• 2024

Related benchmarks

Task	Dataset	Result
Relative Pose Estimation	ScanNet 1500 pairs (test)	AUC@5°33.4	56
Relative Pose Estimation	MegaDepth 1500 pairs (test)	AUC@5°61.1	17
Relative Pose Estimation	MegaDepth 1500 outdoor pairs (test)	AUC@5°61.1	17
Area Matching	ScanNet1500	AOR0.6899	8
Visual Odometry	KITTI360 (Seq. 00)	Rotational Error (Rerr)3.9	8
Visual Odometry	KITTI360 (Seq. 02)	Rerr0.051	8
Visual Odometry	KITTI360 (Seq. 05)	Rotational Error (Rerr)0.041	8
Visual Odometry	KITTI360 (Seq. 06)	Rotational Error (Rerr)0.044	8
Shape Matching	SceneFlow	MAS@4036.23	7
Shape Matching	KITTI	MAS @ 40mm Threshold22.15	7

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord