IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching
About
Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ constructs Multi-range Geometry Encoding Volumes (MGEV), which encode coarse-grained geometry information for ill-posed regions and large disparities, while preserving fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. Then, we input the fused geometry features into ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23\% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9\% and 54.8\% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks. The code is publicly available at https://github.com/gangweix/IGEV and https://github.com/gangweix/IGEV-plusplus.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 | D1 Error (All)1.79 | 118 | |
| Stereo Matching | KITTI 2012 | Error Rate (3px, Noc)1.29 | 81 | |
| Stereo Matching | KITTI 2012 (test) | -- | 76 | |
| Stereo Matching | Scene Flow (test) | EPE0.5 | 70 | |
| Stereo Matching | ETH3D | bad 1.01.58 | 51 | |
| Stereo Matching | Scene Flow | EPE (px)0.52 | 40 | |
| Stereo Matching | KITTI 2015 (all pixels) | D1 Error (Background)1.31 | 38 | |
| Stereo Matching | Middlebury | Bad Pixel Rate (Thresh 2.0)7.19 | 34 | |
| Stereo Matching | ETH3D | Threshold Error > 1px (All)4.45 | 30 | |
| Stereo Matching | KITTI 2012 (Noc) | Error Rate (>2px)1.56 | 26 |