End-to-End Learning of Geometry and Context for Deep Stereo Regression
About
We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem's geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.
Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 (test) | D1 Error (Overall)2.67 | 144 | |
| Stereo Matching | KITTI 2015 | D1 Error (All)2.87 | 118 | |
| Stereo Matching | KITTI 2012 | Error Rate (3px, Noc)0.0177 | 81 | |
| Disparity Estimation | KITTI 2015 (test) | D1 Error (bg, all)2.02 | 77 | |
| Stereo Matching | KITTI 2012 (test) | Outlier Rate (3px, Noc)1.77 | 76 | |
| Stereo Matching | Scene Flow (test) | EPE1.84 | 70 | |
| Depth Estimation | ScanNet (test) | Abs Rel0.107 | 65 | |
| Stereo Matching | KITTI Noc 2015 | D1 Error (Background)2.02 | 32 | |
| Stereo Matching | KITTI 2012 (Noc) | Error Rate (>2px)2.71 | 26 | |
| Stereo Matching | KITTI 2012 (All split) | Error Rate (>2px)3.46 | 26 |
Showing 10 of 24 rows