Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses

About

We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often excel in matching speed, transformer-based methods tend to provide more accurate matches. We propose an efficient transformer-based network architecture for local feature matching. This technique is built on constructing multiple homography hypotheses to approximate the continuous correspondence in the real world and uni-directional cross-attention to accelerate the refinement. On the YFCC100M dataset, our matching accuracy is competitive with LoFTR, a state-of-the-art transformer-based architecture, while the inference speed is boosted to 4 times, even outperforming the CNN-based methods. Comprehensive evaluations on other open datasets such as Megadepth, ScanNet, and HPatches demonstrate our method's efficacy, highlighting its potential to significantly enhance a wide array of downstream applications.

Junjie Ni, Guofeng Zhang, Guanglin Li, Yijin Li, Xinyang Liu, Zhaoyang Huang, Hujun Bao• 2024

Related benchmarks

TaskDatasetResultRank
Outdoor Pose EstimationMegaDepth (test)
AUC @ 5°51.7
10
Outdoor Pose EstimationYFCC100M (test)
AUC @ 5 deg44.8
8
Indoor Pose EstimationScanNet 32 (test)
AUC @5°20.1
6
Homography EstimationHPatches 52 sequences (illumination) + 56 sequences (viewpoint) v1.0
Avg Corner Error (1px)42
5
Showing 4 of 4 rows

Other info

Follow for update