CoMo3R-SLAM: Collaborative Monocular Dense SLAM with Learned 3D Reconstruction Priors for Outdoor Multi-Agent Systems
About
Collaborative dense SLAM is essential for multi-robot teams to achieve scalable and consistent 3D perception across large-scale outdoor environments. Existing systems typically depend on depth sensors, incurring significant payload, power, and calibration costs. Monocular RGB cameras are a lightweight alternative, but collaborative monocular dense SLAM remains difficult due to scale ambiguity, unreliable inter-agent data association, especially in outdoor scenes where low overlap and repetitive structures make traditional feature matching unreliable, motivating robust geometric information. We propose CoMo3R-SLAM, the first collaborative monocular dense RGB SLAM system that leverages robust learned feed-forward 3D reconstruction priors for outdoor multi-agent mapping. Each agent runs a prior-guided front-end for real-time tracking and local dense fusion, while a coordinator performs dense pointmap matching for cross-agent verification, closed-form Sim(3) gauge synchronization, and GPU-accelerated global bundle adjustment with segment-level depth optimization. Requiring neither depth sensors nor parametric intrinsics, our system produces robust cross-agent constraints and globally consistent metric maps from monocular RGB alone. On Tanks and Temples and Waymo sequences, CoMo3R-SLAM achieves the best ATE on three of four Tanks and Temples scenes and competitive Waymo accuracy, matching or exceeding state-of-the-art RGB-D methods while running online at 8 FPS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking | Waymo Open Dataset (Segment 158686) | ATE1.773 | 11 | |
| Tracking Accuracy | Tanks & Temples 1 (test) | ATE RMSE (Caterpillar) [m]0.102 | 6 | |
| Trajectory tracking | Waymo (Scene 134763) | ATE RMSE1.965 | 6 | |
| Trajectory tracking | Waymo Scene (106762) | ATE RMSE5.004 | 5 | |
| Tracking | Waymo Open Dataset (Segment 153495) | -- | 5 |