Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM

About

We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches for accurate and robust tracking and mapping. Our core idea is to bridge flow estimation with geometric reasoning by leveraging the guidance from foundation depth models. To this end, we first develop a Hybrid Flow Network that produces geometry-aware correspondences, enabling consistent depth and pose inference across diverse keyframes. To enforce global consistency, we propose a Bi-Consistent Bundle Adjustment Layer that jointly optimizes keyframe pose and depth under multi-view constraints. Furthermore, we introduce a Reliability-Aware Refinement mechanism that dynamically adapts the flow update process by distinguishing between reliable and uncertain regions, forming a closed feedback loop between matching and optimization. Extensive experiments demonstrate that FoundationSLAM achieves superior trajectory accuracy and dense reconstruction quality across multiple challenging datasets, while running in real-time at 18 FPS, demonstrating strong generalization to various scenarios and practical applicability of our method.

Yuchen Wu, Jiahe Li, Fabio Tosi, Matteo Poggi, Jin Zheng, Xiao Bai• 2025

Related benchmarks

TaskDatasetResultRank
Visual-Inertial OdometryEuRoC (All sequences)
MH1 Error0.01
51
TrackingTUM-RGBD (various sequences)
Average Translational Error0.024
16
TrackingETH3D-SLAM
ATE0.069
7
Tracking and Mapping7Scenes
ATE0.043
4
Simultaneous Localization and Mapping (SLAM)EuRoC
FPS18
4
Tracking and MappingEuRoC
ATE0.019
3
Showing 6 of 6 rows

Other info

Follow for update