iMAP: Implicit Mapping and Positioning in Real-Time
About
We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a real-time SLAM system for a handheld RGB-D camera. Our network is trained in live operation without prior data, building a dense, scene-specific implicit 3D model of occupancy and colour which is also immediately used for tracking. Achieving real-time SLAM via continual training of a neural network against a live image stream requires significant innovation. Our iMAP algorithm uses a keyframe structure and multi-processing computation flow, with dynamic information-guided pixel sampling for speed, with tracking at 10 Hz and global map updating at 2 Hz. The advantages of an implicit MLP over standard dense SLAM techniques include efficient geometry representation with automatic detail control and smooth, plausible filling-in of unobserved regions such as the back surfaces of objects.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera pose estimation | ScanNet | ATE RMSE (Avg.)33 | 61 | |
| Camera Tracking | ScanNet v2 (test) | ATE RMSE (cm)11.91 | 28 | |
| Tracking | TUM RGB-D 44 (various sequences) | Average Error97.85 | 28 | |
| Camera Tracking | BONN dynamic sequences | -- | 25 | |
| Absolute Trajectory Estimation | TUM RGB-D | Desk Error0.049 | 23 | |
| Tracking | Bonn RGB-D dataset | Balloon267 | 23 | |
| Reconstruction | Replica average over 8 scenes | Accuracy (Dist)3.621 | 21 | |
| Camera Tracking | TUM RGB-D fr2 xyz | ATE RMSE0.02 | 16 | |
| Camera Tracking | TUM RGB-D fr3 office | ATE RMSE0.058 | 16 | |
| Camera Tracking | TUM RGB-D fr1 desk | ATE RMSE0.049 | 16 |