Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement
About
Almost all previous deep learning-based multi-view stereo (MVS) approaches focus on improving reconstruction quality. Besides quality, efficiency is also a desirable feature for MVS in real scenarios. Towards this end, this paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS. Specifically, in our Fast-MVSNet, we first construct a sparse cost volume for learning a sparse and high-resolution depth map. Then we leverage a small-scale convolutional neural network to encode the depth dependencies for pixels within a local region to densify the sparse high-resolution depth map. At last, a simple but efficient Gauss-Newton layer is proposed to further optimize the depth map. On one hand, the high-resolution depth map, the data-adaptive propagation method and the Gauss-Newton layer jointly guarantee the effectiveness of our method. On the other hand, all modules in our Fast-MVSNet are lightweight and thus guarantee the efficiency of our approach. Besides, our approach is also memory-friendly because of the sparse depth representation. Extensive experimental results show that our method is 5$\times$ and 14$\times$ faster than Point-MVSNet and R-MVSNet, respectively, while achieving comparable or even better results on the challenging Tanks and Temples dataset as well as the DTU dataset. Code is available at https://github.com/svip-lab/FastMVSNet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-view Stereo | Tanks and Temples Intermediate set | Mean F1 Score47.39 | 110 | |
| Multi-view Stereo | DTU (test) | -- | 61 | |
| 3D Geometry Reconstruction | ScanNet | Accuracy5.9 | 54 | |
| Multi-view Stereo | DTU 1 (evaluation) | Accuracy Error (mm)0.336 | 51 | |
| Multi-view Stereo | Tanks & Temples Intermediate | F-score47.39 | 43 | |
| Multi-view Stereo Reconstruction | DTU (evaluation) | Mean Distance (mm) - Acc.0.336 | 35 | |
| 2D Depth Estimation | ScanNet | AbsRel0.084 | 26 | |
| Multi-view Depth Estimation | ScanNet (test) | Abs Rel0.089 | 23 | |
| Depth Estimation | TUM-RGBD | Abs Rel Error0.113 | 16 | |
| Point Cloud Reconstruction | DTU (test) | Accuracy33.6 | 15 |