MVSNet: Depth Inference for Unstructured Multi-view Stereo

About

We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan• 2018

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	DDAD (test)	RMSE8.21	122
Multi-view Stereo	Tanks and Temples Intermediate set	Mean F1 Score43.48	110
Multi-view Stereo	DTU (test)	Accuracy39.6	61
Multi-view Stereo	Tanks & Temples Intermediate	F-score43.48	56
Multi-view Stereo	DTU 1 (evaluation)	Accuracy Error (mm)0.396	51
3D Reconstruction	DTU	Average Error2.38	47
Multi-view Stereo	Tanks&Temples	Family55.99	46
Multi-view Depth Estimation	DDAD (test)	AbsRel0.112	40
Multi-view Stereo Reconstruction	DTU (evaluation)	Mean Distance (mm) - Acc.0.396	35
2D Depth Estimation	7 Scenes	Abs Rel0.2339	28

Showing 10 of 25 rows

Other info

Code

Follow for update

@wizwand_team Discord