Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

About

We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.

Jure \v{Z}bontar, Yann LeCun• 2015

Related benchmarks

Task	Dataset	Result
Stereo Matching	KITTI 2015 (test)	D1 Error (Overall)3.88	245
Stereo Matching	KITTI 2015	D1 Error (All)3.89	142
Stereo Matching	KITTI 2012	Error Rate (3px, All)0.0322	108
Stereo Matching	KITTI 2012 (test)	Outlier Rate (3px, Noc)2.09	105
Disparity Estimation	KITTI 2015 (test)	D1 Error (bg, all)2.48	77
Stereo Matching	KITTI Noc 2015	D1 Error (Background)2.48	42
Stereo Matching	Middlebury v3	Bad Pixel Rate (Thresh 2.0)8.08	35
Stereo Matching	KITTI 2012 (Noc)	Error Rate (>2px)3.9	26
Stereo Matching	KITTI 2012 (All split)	Error Rate (>2px)5.45	26
Disparity Estimation	Scene Flow (test)	EPE3.79	24

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord