IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching
About
Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ constructs Multi-range Geometry Encoding Volumes (MGEV), which encode coarse-grained geometry information for ill-posed regions and large disparities, while preserving fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. Then, we input the fused geometry features into ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23\% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9\% and 54.8\% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks. The code is publicly available at https://github.com/gangweix/IGEV and https://github.com/gangweix/IGEV-plusplus.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 | D1 Error (All)1.79 | 118 | |
| Stereo Matching | KITTI 2012 | Error Rate (3px, All)1.68 | 108 | |
| Stereo Matching | KITTI 2012 (test) | -- | 89 | |
| Stereo Matching | Scene Flow (test) | EPE0.43 | 77 | |
| Stereo Matching | ETH3D | bad 1.01.58 | 51 | |
| Stereo Matching | ETH3D | Threshold Error > 1px (Noc)3.81 | 50 | |
| Stereo Matching | KITTI 2015 (all pixels) | D1 Error (Background)1.31 | 48 | |
| Stereo Matching | ETH3D (non-occluded) | Bad 1.0 Error1.14 | 43 | |
| Stereo Matching | Middlebury | Bad Pixel Rate (Thresh 2.0)7.19 | 42 | |
| Stereo Matching | Scene Flow | EPE (px)0.52 | 40 |