Hierarchical Deep Stereo Matching on High-resolution Images
About
We explore the problem of real-time stereo matching on high-res imagery. Many state-of-the-art (SOTA) methods struggle to process high-res imagery because of memory constraints or speed limitations. To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation. Our approach achieved SOTA performance on Middlebury-v3 and KITTI-15 while running significantly faster than its competitors. The hierarchical design also naturally allows for anytime on-demand reports of disparity by capping intermediate coarse results, allowing us to accurately predict disparity for near-range structures with low latency (30ms). We demonstrate that the performance-vs-speed trade-off afforded by on-demand hierarchies may address sensing needs for time-critical applications such as autonomous driving.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 (test) | D1 Error (Overall)2.14 | 144 | |
| Stereo Matching | KITTI 2015 | D1 Error (All)3.74 | 118 | |
| Disparity Estimation | KITTI 2015 (test) | D1 Error (bg, all)1.8 | 77 | |
| Stereo Matching | KITTI 2012 (test) | Outlier Rate (3px, Noc)1.53 | 76 | |
| Stereo Matching | ETH3D | bad 1.04.4 | 51 | |
| Stereo Matching | Middlebury | Bad Pixel Rate (Thresh 2.0)16.5 | 34 | |
| Depth Estimation | Gated Stereo Day 1.0 (test) | RMSE10.36 | 19 | |
| Depth Estimation | Gated Stereo Night 1.0 (test) | RMSE12.42 | 19 | |
| Stereo Matching | Middlebury v3 | Average Error2.07 | 17 | |
| Stereo Depth Estimation | SQUID zero-shot | Relative Error (Rel)0.9772 | 16 |