Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation
About
In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Photometric Rendering | Replica (room0-2, office0-4) | PSNR29.83 | 80 | |
| Tracking | TUM RGB-D 44 (various sequences) | Average Error86.76 | 41 | |
| Camera Tracking | Replica | Rotation Error (rm-0)1.37 | 38 | |
| Camera pose estimation | TUM RGB-D 36 | Error (desk)3.52 | 26 | |
| Camera Tracking | BONN dynamic sequences | -- | 25 | |
| Tracking | Bonn RGB-D dataset | Balloon282.1 | 23 | |
| Reconstruction | Replica average over 8 scenes | Accuracy (Dist)1.882 | 21 | |
| Tracking | TUM RGBD (test) | fr1/desk Error3.52 | 18 | |
| Camera Tracking | TUM RGB-D | ATE RMSE (cm)10.34 | 18 | |
| Tracking | ScanNet | ATE RMSE (Seq 00)68.84 | 18 |