Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation
About
In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tracking | TUM RGB-D 44 (various sequences) | Average Error86.76 | 28 | |
| Camera Tracking | BONN dynamic sequences | -- | 25 | |
| Tracking | Bonn RGB-D dataset | Balloon282.1 | 23 | |
| Reconstruction | Replica average over 8 scenes | Accuracy (Dist)1.882 | 21 | |
| Tracking | TUM RGBD (test) | fr1/desk Error3.52 | 18 | |
| Camera Tracking | TUM RGB-D fr1 desk | ATE RMSE0.0352 | 16 | |
| Camera Tracking | TUM RGB-D fr2 xyz | ATE RMSE0.0149 | 16 | |
| Camera Tracking | TUM RGB-D fr3 office | ATE RMSE0.2601 | 16 | |
| Camera Tracking | TUM dynamic scene sequences | ATE Component w_x (f3)146.6 | 15 | |
| Camera Tracking | Replica | Rotation Error (rm-0)1.37 | 14 |