Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DINO-VO: Learning Where to Focus for Enhanced State Estimation

About

We present DINO Patch Visual Odometry (DINO-VO), an end-to-end monocular visual odometry system with strong scene generalization. Current Visual Odometry (VO) systems often rely on heuristic feature extraction strategies, which can degrade accuracy and robustness, particularly in large-scale outdoor environments. DINO-VO addresses these limitations by incorporating a differentiable adaptive patch selector into the end-to-end pipeline, improving the quality of extracted patches and enhancing generalization across diverse datasets. Additionally, our system integrates a multi-task feature extraction module with a differentiable bundle adjustment (BA) module that leverages inverse depth priors, enabling the system to learn and utilize appearance and geometric information effectively. This integration bridges the gap between feature learning and state estimation. Extensive experiments on the TartanAir, KITTI, Euroc, and TUM datasets demonstrate that DINO-VO exhibits strong generalization across synthetic, indoor, and outdoor environments, achieving state-of-the-art tracking accuracy.

Qi Chen, Guanghao Li, Sijia Hu, Xin Gao, Junpeng Ma, Xiangyang Xue, Jian Pu• 2026

Related benchmarks

TaskDatasetResultRank
Visual OdometryTUM-RGBD
freiburg1/desk2 Error0.052
37
Visual OdometryKITTI Odometry official (sequences 00-10)
Sequence 10 Error8.88
12
Visual OdometryEuRoC
MH01 Error0.056
8
Monocular Visual OdometryTartanAir monocular (test)
Mean Error (Seq 000)0.22
4
Showing 4 of 4 rows

Other info

Follow for update