Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LoFTR: Detector-Free Local Feature Matching with Transformers

About

We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods.

Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, Xiaowei Zhou• 2021

Related benchmarks

TaskDatasetResultRank
6D Pose EstimationYCB-Video--
148
Relative Pose EstimationMegaDepth 1500
AUC @ 5°52.8
104
Relative Pose EstimationMegaDepth (test)
Pose AUC @5°60.15
83
6DoF Pose EstimationYCB-Video (test)--
72
Homography EstimationHPatches
Overall Accuracy (< 1px)44
59
Homography EstimationHPatches
AUC @3px65.9
35
Visual Localization7scenes indoor
Positional Error (Chess, cm)3
30
6D Pose EstimationLineMod (test)--
29
Visual LocalizationAachen Day-Night 1.1 (Day)
Success Rate (0.25m, 2°)89.1
28
Pose EstimationMegaDepth 1500 (test)
AUC @ 5°52.8
27
Showing 10 of 99 rows
...

Other info

Code

Follow for update