FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM

About

We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches for accurate and robust tracking and mapping. Our core idea is to bridge flow estimation with geometric reasoning by leveraging the guidance from foundation depth models. To this end, we first develop a Hybrid Flow Network that produces geometry-aware correspondences, enabling consistent depth and pose inference across diverse keyframes. To enforce global consistency, we propose a Bi-Consistent Bundle Adjustment Layer that jointly optimizes keyframe pose and depth under multi-view constraints. Furthermore, we introduce a Reliability-Aware Refinement mechanism that dynamically adapts the flow update process by distinguishing between reliable and uncertain regions, forming a closed feedback loop between matching and optimization. Extensive experiments demonstrate that FoundationSLAM achieves superior trajectory accuracy and dense reconstruction quality across multiple challenging datasets, while running in real-time at 18 FPS, demonstrating strong generalization to various scenarios and practical applicability of our method.

Yuchen Wu, Jiahe Li, Fabio Tosi, Matteo Poggi, Jin Zheng, Xiao Bai• 2025

Related benchmarks

Task	Dataset	Result
Visual-Inertial Odometry	EuRoC (All sequences)	MH1 Error0.01	69
Tracking and Mapping	7Scenes	--	22
Tracking	TUM-RGBD (various sequences)	Average Translational Error0.024	16
Tracking	ETH3D-SLAM	ATE0.069	7
Simultaneous Localization and Mapping (SLAM)	EuRoC	FPS18	4
Tracking and Mapping	EuRoC	ATE0.019	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord