StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction
About
This paper presents StereoNet, the first end-to-end deep architecture for real-time stereo matching that runs at 60 fps on an NVidia Titan X, producing high-quality, edge-preserved, quantization-free disparity maps. A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. This allows us to achieve real-time performance by using a very low resolution cost volume that encodes all the information needed to achieve high disparity precision. Spatial precision is achieved by employing a learned edge-aware upsampling function. Our model uses a Siamese network to extract features from the left and right image. A first estimate of the disparity is computed in a very low resolution cost volume, then hierarchically the model re-introduces high-frequency details through a learned upsampling function that uses compact pixel-to-pixel refinement networks. Leveraging color input as a guide, this function is capable of producing high-quality edge-aware output. We achieve compelling results on multiple benchmarks, showing how the proposed method offers extreme flexibility at an acceptable computational budget.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 (test) | D1 Error (Overall)0.0483 | 144 | |
| Stereo Matching | KITTI 2015 | D1 Error (All)4.83 | 118 | |
| Stereo Matching | KITTI 2012 | -- | 81 | |
| Disparity Estimation | KITTI 2015 (test) | D1 Error (bg, all)4.3 | 77 | |
| Stereo Matching | KITTI 2012 (test) | -- | 76 | |
| Stereo Matching | Scene Flow (test) | EPE1.1 | 70 | |
| Disparity Estimation | Scene Flow (test) | -- | 24 | |
| Stereo Matching | Scene Flow (finalpass) | EPE (px)1.1 | 22 | |
| Disparity Estimation | KITTI 2012 (test) | Mean Error (Noc)0.8 | 9 |