3DVNet: Multi-View Depth Prediction and Volumetric Refinement

About

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

Alexander Rich, Noah Stier, Pradeep Sen, Tobias H\"ollerer• 2021

Related benchmarks

Task	Dataset	Result
3D Geometry Reconstruction	ScanNet	Accuracy5.1	54
2D Depth Estimation	ScanNet	AbsRel0.04	26
3D Scene Reconstruction	ScanNet v2 (test)	Accuracy0.221	26
Depth Estimation	TUM-RGBD	Abs Rel Error0.076	16
3D Reconstruction	TUM-RGBD	F-score18.1	11
3D Reconstruction	ICL-NUIM	F-score44	11
Depth Estimation	ICL-NUIM	Abs Rel Error0.05	11

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord