PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM
About
Monocular SLAM historically suffers from scale ambiguity and tracking failure in dynamic environments. While recent vision foundation models (VFMs) provide remarkable zero-shot depth priors, naively integrating these deterministic predictions ignores predictive uncertainty and frame-to-frame scale inconsistencies. We propose PRISM-SLAM, a real-time framework that rigorously integrates VFM priors into a structured Bayesian factor graph to achieve scale-aware, metric-consistent localization and mapping. Specifically, we introduce a Pl\"ucker Ray-Distance Factor to anchor monocular observations in absolute space within a globally consistent metric coordinate system, mathematically resolving scale drift by making the metric scale Fisher-identifiable. To handle environmental dynamics, we derive an epistemic uncertainty proxy from temporal depth consistency and formulate a Dynamic Scene Uncertainty Gating (DSUG) mechanism. This soft-gating approach probabilistically down-weights dynamic distractors without incurring the heavy computational overhead associated with traditional semantic segmentation masks. By employing a multi-process architecture that asynchronously processes VFM inference and geometric tracking, PRISM-SLAM provides verified metric output at 30 FPS using solely RGB input, bridging the gap between foundation models and real-world robotic applications. Evaluated on the TUM RGB-D and 7-Scenes benchmarks, PRISM-SLAM achieves a metric $SE(3)$ Absolute Trajectory Error (ATE) nearly identical to its oracle-aligned $Sim(3)$ error. This demonstrates that our system can produce deployment-ready metric trajectories by delivering robust metric SLAM solutions without any post-hoc scale correction. Project page: https://prismslam-cmd.github.io/prismslam_pr/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking and Mapping | 7Scenes | ATE (chess)7.1 | 22 | |
| Monocular SLAM | Monocular SLAM Evaluation | FPS30 | 11 | |
| Tracking | TUM RGB-D (fr1 Sequences) | Sim(3) ATE RMSE (xyz)2.86 | 10 | |
| Dynamic Tracking | BONN Dynamic balloon2 2019 | Sim(3) ATE RMSE (cm)14 | 5 | |
| Dynamic Tracking | BONN Dynamic 2019 (balloon) | Sim(3) ATE RMSE (cm)9.8 | 5 | |
| Dynamic Tracking | BONN Dynamic 2019 (pers_trk) | Sim(3) ATE RMSE (cm)36.7 | 5 | |
| Trajectory Estimation | TUM RGB-D fr3 Sequences | ATE RMSE (sit, Sim(3))1.6 | 5 | |
| Visual Odometry | KITTI Odometry first 500 frames (seq 03) | SE(3) ATE (m)4.3 | 2 | |
| Dynamic Tracking | BONN Dynamic 2019 (balloon_trk) | Sim(3) ATE RMSE (cm)7.8 | 1 |