MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
About
We present a real-time monocular dense SLAM system designed bottom-up from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model beyond a unique camera centre. We introduce efficient methods for pointmap matching, camera tracking and local fusion, graph construction and loop closure, and second-order global optimisation. With known calibration, a simple modification to the system achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera pose estimation | ScanNet | -- | 119 | |
| 3D Reconstruction | 7 Scenes | -- | 94 | |
| Visual-Inertial Odometry | EuRoC (All sequences) | MH1 Error0.023 | 62 | |
| Camera pose estimation | TUM | ATE1.21 | 55 | |
| 3D Reconstruction | NRGBD | Chamfer Distance0.08 | 44 | |
| Absolute Trajectory Estimation | TUM RGB-D | Desk Error0.016 | 36 | |
| Pose Estimation | ScanNet++ | -- | 32 | |
| Pose Estimation | 7 Scenes | Average Median Translation Error (m)6.6 | 29 | |
| Camera pose estimation | TUM RGB-D 36 | Error (desk)0.016 | 26 | |
| 3D Reconstruction | ETH3D | F1 Score31 | 25 |