Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VGGT-SLAM 2.0: Real-time Dense Feed-forward Scene Reconstruction

About

We present VGGT-SLAM 2.0, a real-time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real-time performance while running online onboard a ground robot using a Jetson Thor. We test in environments ranging from cluttered indoor apartments and office scenes to a 4,200 square foot barn, and we also demonstrate VGGT-SLAM 2.0 achieves the highest accuracy on the TUM dataset with about 23 percent less pose error than VGGT-SLAM. Code will be released upon publication.

Dominic Maggio, Luca Carlone• 2026

Related benchmarks

TaskDatasetResultRank
Absolute Trajectory EstimationTUM RGB-D
Desk Error0.025
36
Camera TrackingTUM RGB-D
ATE RMSE (cm)4
18
Visual-Inertial OdometryEuRoC MAV
Average Error1.952
14
Visual OdometryKITTI Odometry official (sequences 00-10)
Sequence 10 Error23.321
12
Dense ReconstructionTUM RGB-D
Completion Error0.21
9
TrackingWaymo
ATE RMSE (m)1.295
7
TrackingKITTI
ATE RMSE (m)2.521
7
TrackingScanNet V2
ATE RMSE (m)0.073
6
TrackingScanNet++
ATE RMSE (m)0.182
6
Localization7 Scenes
ATE RMSE0.07
5
Showing 10 of 11 rows

Other info

Follow for update