Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RDD: Robust Feature Detector and Descriptor using Deformable Transformer

About

As a core step in structure-from-motion and SLAM, robust feature detection and description under challenging scenarios such as significant viewpoint changes remain unresolved despite their ubiquity. While recent works have identified the importance of local features in modeling geometric transformations, these methods fail to learn the visual cues present in long-range relationships. We present Robust Deformable Detector (RDD), a novel and robust keypoint detector/descriptor leveraging the deformable transformer, which captures global context and geometric invariance through deformable self-attention mechanisms. Specifically, we observed that deformable attention focuses on key locations, effectively reducing the search space complexity and modeling the geometric invariance. Furthermore, we collected an Air-to-Ground dataset for training in addition to the standard MegaDepth dataset. Our proposed method outperforms all state-of-the-art keypoint detection/description methods in sparse matching tasks and is also capable of semi-dense matching. To ensure comprehensive evaluation, we introduce two challenging benchmarks: one emphasizing large viewpoint and scale variations, and the other being an Air-to-Ground benchmark -- an evaluation setting that has recently gaining popularity for 3D reconstruction across different altitudes.

Gonglin Chen, Tianwen Fu, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Relative Pose EstimationMegaDepth 1500
AUC @ 5°52.3
104
Pose EstimationMegaDepth 1500 (test)
AUC @ 5°51.3
27
Relative Pose EstimationMegaDepth 1500 (test)
AUC@5°51.9
20
Visual LocalizationAachen Day-Night 1.0 (Night)
AUC @ (0.25m, 2°)86.7
18
Relative Pose EstimationMegaDepth View
AUC @ 5°54.2
17
Sparse 3D ReconstructionETH Local Feature Benchmark Madrid Metropolis v1.0
nReg632
17
3D ReconstructionETH local feature benchmark Tower of London
Image Count834
16
3D ReconstructionETH local feature benchmark Gendarmenmarkt
Image Count1.06e+3
16
Visual LocalizationAachen Day-Night 1.0 (Day)
AUC (0.25m, 2°)87
14
Relative Pose EstimationAir-to-Ground
AUC @ 5°55.1
11
Showing 10 of 17 rows

Other info

Code

Follow for update