From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

About

Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes. In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the \textbf{Tra}ck-\textbf{q}uality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.The code will be available at https://github.com/xiaomi-research/traqpoint.

Yepeng Liu, Hao Li, Liwen Yang, Fangzhen Li, Xudi Ge, Yuliang Gu, kuang Gao, Bing Wang, Guang Chen, Hangjun Ye, Yongchao Xu• 2026

Related benchmarks

Task	Dataset	Result
3D Reconstruction	ETH local feature benchmark Gendarmenmarkt	Track Length11.06	24
3D Reconstruction	ETH local feature benchmark Tower of London	Track Length13.28	24
Relative Pose Estimation	MegaDepth 1500 (test)	AUC@5°55.8	20
Sparse 3D Reconstruction	ETH Local Feature Benchmark Madrid Metropolis v1.0	nReg693	17
Relative Pose Estimation	ScanNet (test)	AUC@5°16.6	10
Visual Odometry	KITTI Odometry Benchmark Seq-01 (test)	ATE29.9	5
Visual Odometry	KITTI Seq-02 Odometry Benchmark (test)	ATE11.8	5
Visual Odometry	KITTI Odometry Benchmark Seq-03 (test)	ATE1.3	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord