Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VGGT-Motion: Motion-Aware Calibration-Free Monocular SLAM for Long-Range Consistency

About

Despite recent progress in calibration-free monocular SLAM via 3D vision foundation models, scale drift remains severe on long sequences. Motion-agnostic partitioning breaks contextual coherence and causes zero-motion drift, while conventional geometric alignment is computationally expensive. To address these issues, we propose VGGT-Motion, a calibration-free SLAM system for efficient and robust global consistency over kilometer-scale trajectories. Specifically, we first propose a motion-aware submap construction mechanism that uses optical flow to guide adaptive partitioning, prune static redundancy, and encapsulate turns for stable local geometry. We then design an anchor-driven direct Sim(3) registration strategy. By exploiting context-balanced anchors, it achieves search-free, pixel-wise dense alignment and efficient loop closure without costly feature matching. Finally, a lightweight submap-level pose graph optimization enforces global consistency with linear complexity, enabling scalable long-range operation. Experiments show that VGGT-Motion markedly improves trajectory accuracy and efficiency, achieving state-of-the-art performance in zero-shot, long-range calibration-free monocular SLAM.

Zhuang Xiong, Chen Zhang, Qingshan Xu, Wenbing Tao• 2026

Related benchmarks

TaskDatasetResultRank
Monocular SLAMKITTI (Sequences 00-10)
ATE RMSE Seq 037.08
9
Monocular SLAMWaymo Open (test)
Metric 1634531911.35
6
Monocular SLAM4Seasons long-sequence generalization
ATE (m)12.22
3
Monocular SLAMComplex Urban long-sequence generalization
ATE (m)35.48
3
Monocular SLAMA2D2 long-sequence generalization
ATE (m)29.8
3
Monocular SLAMTUM-Mono Handheld Sequences
Seq 17 Error10.31
3
Showing 6 of 6 rows

Other info

Follow for update