VGGT-Motion: Motion-Aware Calibration-Free Monocular SLAM for Long-Range Consistency

About

Despite recent progress in calibration-free monocular SLAM via 3D vision foundation models, scale drift remains severe on long sequences. Motion-agnostic partitioning breaks contextual coherence and causes zero-motion drift, while conventional geometric alignment is computationally expensive. To address these issues, we propose VGGT-Motion, a calibration-free SLAM system for efficient and robust global consistency over kilometer-scale trajectories. Specifically, we first propose a motion-aware submap construction mechanism that uses optical flow to guide adaptive partitioning, prune static redundancy, and encapsulate turns for stable local geometry. We then design an anchor-driven direct Sim(3) registration strategy. By exploiting context-balanced anchors, it achieves search-free, pixel-wise dense alignment and efficient loop closure without costly feature matching. Finally, a lightweight submap-level pose graph optimization enforces global consistency with linear complexity, enabling scalable long-range operation. Experiments show that VGGT-Motion markedly improves trajectory accuracy and efficiency, achieving state-of-the-art performance in zero-shot, long-range calibration-free monocular SLAM.

Zhuang Xiong, Chen Zhang, Qingshan Xu, Wenbing Tao• 2026

Related benchmarks

Task	Dataset	Result
Monocular SLAM	KITTI (Sequences 00-10)	ATE RMSE Seq 037.08	9
Monocular SLAM	Waymo Open (test)	Metric 1634531911.35	6
Monocular SLAM	4Seasons long-sequence generalization	ATE (m)12.22	3
Monocular SLAM	Complex Urban long-sequence generalization	ATE (m)35.48	3
Monocular SLAM	A2D2 long-sequence generalization	ATE (m)29.8	3
Monocular SLAM	TUM-Mono Handheld Sequences	Seq 17 Error10.31	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord