From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
About
Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes. In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the \textbf{Tra}ck-\textbf{q}uality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Relative Pose Estimation | MegaDepth 1500 (test) | AUC@5°55.8 | 20 | |
| Sparse 3D Reconstruction | ETH Local Feature Benchmark Madrid Metropolis v1.0 | nReg693 | 17 | |
| 3D Reconstruction | ETH local feature benchmark Gendarmenmarkt | Image Count1.09e+3 | 16 | |
| 3D Reconstruction | ETH local feature benchmark Tower of London | Image Count875 | 16 | |
| Relative Pose Estimation | ScanNet (test) | AUC@5°16.6 | 10 | |
| Visual Odometry | KITTI Odometry Benchmark Seq-01 (test) | ATE29.9 | 5 | |
| Visual Odometry | KITTI Seq-02 Odometry Benchmark (test) | ATE11.8 | 5 | |
| Visual Odometry | KITTI Odometry Benchmark Seq-03 (test) | ATE1.3 | 5 |