Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
About
We present Co-SLAM, a neural RGB-D SLAM system based on a hybrid representation, that performs robust camera tracking and high-fidelity surface reconstruction in real time. Co-SLAM represents the scene as a multi-resolution hash-grid to exploit its high convergence speed and ability to represent high-frequency local features. In addition, Co-SLAM incorporates one-blob encoding, to encourage surface coherence and completion in unobserved areas. This joint parametric-coordinate encoding enables real-time and robust performance by bringing the best of both worlds: fast convergence and surface hole filling. Moreover, our ray sampling strategy allows Co-SLAM to perform global bundle adjustment over all keyframes instead of requiring keyframe selection to maintain a small number of active keyframes as competing neural SLAM approaches do. Experimental results show that Co-SLAM runs at 10-17Hz and achieves state-of-the-art scene reconstruction results, and competitive tracking performance in various datasets and benchmarks (ScanNet, TUM, Replica, Synthetic RGBD). Project page: https://hengyiwang.github.io/projects/CoSLAM
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera pose estimation | ScanNet | ATE RMSE (Avg.)8.9 | 61 | |
| Camera Tracking | ScanNet v2 (test) | ATE RMSE (cm)5.9 | 28 | |
| Tracking | TUM RGB-D 44 (various sequences) | Average Error44 | 28 | |
| Camera Tracking | BONN dynamic sequences | -- | 25 | |
| Absolute Trajectory Estimation | TUM RGB-D | Desk Error0.024 | 23 | |
| Tracking | Bonn RGB-D dataset | Balloon220.6 | 23 | |
| Reconstruction | Replica average over 8 scenes | Accuracy (Dist)2.101 | 21 | |
| Visual SLAM | TUM RGB-D fr1 desk | ATE RMSE (cm)3.094 | 21 | |
| Visual SLAM | TUM RGB-D fr2 xyz | Translation RMSE (m)0.3135 | 21 | |
| Camera Tracking | TUM RGB-D fr1 desk | ATE RMSE0.024 | 16 |