VGGT-SLAM++
About
We introduce VGGT-SLAM++, a complete visual SLAM system that leverages the geometry-rich outputs of the Visual Geometry Grounded Transformer (VGGT). The system comprises a visual odometry (front-end) fusing the VGGT feed-forward transformer and a Sim(3) solution, a Digital Elevation Map (DEM)-based graph construction module, and a back-end that jointly enable accurate large-scale mapping with bounded memory. While prior transformer-based SLAM pipelines such as VGGT-SLAM rely primarily on sparse loop closures or global Sim(3) manifold constraints - allowing short-horizon pose drift - VGGT-SLAM++ restores high-cadence local bundle adjustment (LBA) through a spatially corrective back-end. For each VGGT submap, we construct a dense planar-canonical DEM, partition it into patches, and compute their DINOv2 embeddings to integrate the submap into a covisibility graph. Spatial neighbors are retrieved using a Visual Place Recognition (VPR) module within the covisibility window, triggering frequent local optimization that stabilizes trajectories. Across standard SLAM benchmarks, VGGT-SLAM++ achieves state-of-the-art accuracy, substantially reducing short-term drift, accelerating graph convergence, and maintaining global consistency with compact DEM tiles and sublinear retrieval.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Absolute Trajectory Estimation | TUM RGB-D | Desk Error0.025 | 36 | |
| Visual SLAM | KITTI Sequence 01 | Absolute Translation Error (m)109.6 | 11 | |
| SLAM | KITTI Odometry Sequence 04 | ATE0.95 | 9 | |
| SLAM | KITTI Odometry Sequence 10 | ATE15.71 | 9 | |
| SLAM | KITTI Odometry Sequence 08 | ATE155 | 9 | |
| SLAM | KITTI Odometry Sequence 03 | ATE4.5 | 9 | |
| SLAM | KITTI Odometry Sequence 09 | ATE35.26 | 8 | |
| SLAM | KITTI Odometry Sequence 05 | ATE25.21 | 8 | |
| SLAM | KITTI Odometry Sequence 06 | ATE13.65 | 8 | |
| SLAM | KITTI Odometry Sequence 07 | ATE12.17 | 8 |