Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation

About

In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.

Xingrui Yang, Hai Li, Hongjia Zhai, Yuhang Ming, Yuqian Liu, Guofeng Zhang• 2022

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	Replica	PSNR24.41	198
Photometric Rendering	Replica (room0-2, office0-4)	PSNR29.83	80
Camera Tracking	Replica	Rotation Error (rm-0)1.37	48
Tracking	TUM RGB-D 44 (various sequences)	Average Error86.76	41
Camera Tracking	BONN dynamic sequences	--	38
Camera Tracking	TUM RGB-D	Tracking Error (fr1/desk)3.52	36
Tracking	ScanNet	ATE RMSE (Seq 00)68.84	29
Camera pose estimation	TUM RGB-D 36	Error (desk)3.52	26
Tracking	TUM RGBD (test)	fr1/desk Error3.52	24
Tracking	Bonn RGB-D dataset	Balloon282.1	23

Showing 10 of 67 rows

Other info

Follow for update

@wizwand_team Discord