Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

About

Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-sweeping stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism. Since \ouralg only builds on 2D convolutions, it is at least $2\times$ faster than all the notable counterparts. Moreover, our algorithm produces precise depth estimations and 3D reconstructions, achieving state-of-the-art results on challenging benchmarks ScanNet, SUN3D, RGBD, and the classical DTU dataset. our algorithm also out-performs all other algorithms in the setting of inexact camera poses. Our code is released at \url{https://github.com/zhenpeiyang/MVS2D}

Zhenpei Yang, Zhile Ren, Qi Shan, Qixing Huang• 2021

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationDDAD (test)
RMSE9.82
122
Monocular Depth EstimationKITTI (test)
Abs Rel Error0.058
103
Depth EstimationScanNet
AbsRel0.098
94
Multi-view StereoDTU (test)--
61
Multi-view Depth EstimationDDAD (test)
AbsRel0.133
40
Multi-view Stereo ReconstructionDTU (evaluation)
Mean Distance (mm) - Acc.0.394
35
Multi-view Depth EstimationScanNet (test)
Abs Rel0.059
23
Depth EstimationScanNet v1 (test)
AbsRel0.059
11
Video Depth EstimationScanNet++
Absolute Relative Error27.2
10
Depth EstimationSUN3D (Real)
AbsRel0.099
7
Showing 10 of 15 rows

Other info

Code

Follow for update