StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction

About

This paper presents StereoNet, the first end-to-end deep architecture for real-time stereo matching that runs at 60 fps on an NVidia Titan X, producing high-quality, edge-preserved, quantization-free disparity maps. A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. This allows us to achieve real-time performance by using a very low resolution cost volume that encodes all the information needed to achieve high disparity precision. Spatial precision is achieved by employing a learned edge-aware upsampling function. Our model uses a Siamese network to extract features from the left and right image. A first estimate of the disparity is computed in a very low resolution cost volume, then hierarchically the model re-introduces high-frequency details through a learned upsampling function that uses compact pixel-to-pixel refinement networks. Leveraging color input as a guide, this function is capable of producing high-quality edge-aware output. We achieve compelling results on multiple benchmarks, showing how the proposed method offers extreme flexibility at an acceptable computational budget.

Sameh Khamis, Sean Fanello, Christoph Rhemann, Adarsh Kowdle, Julien Valentin, Shahram Izadi• 2018

Related benchmarks

Task	Dataset	Result
Stereo Matching	KITTI 2015 (test)	D1 Error (Overall)0.0483	245
Stereo Matching	KITTI 2015	D1 Error (All)4.83	142
Stereo Matching	KITTI 2012	--	108
Stereo Matching	KITTI 2012 (test)	--	105
Stereo Matching	Scene Flow (test)	EPE1.1	84
Disparity Estimation	KITTI 2015 (test)	D1 Error (bg, all)4.3	77
Disparity Estimation	Scene Flow (test)	--	24
Stereo Matching	Scene Flow (finalpass)	EPE (px)1.1	22
Disparity Estimation	KITTI 2012 (test)	Mean Error (Noc)0.8	9

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord