ELFNet: Evidential Local-global Fusion for Stereo Matching
About
Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation. Additionally, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the \textbf{E}vidential \textbf{L}ocal-global \textbf{F}usion (ELF) framework for stereo matching, which endows both uncertainty estimation and confidence-aware fusion with trustworthy heads. Instead of predicting the disparity map alone, our model estimates an evidential-based disparity considering both aleatoric and epistemic uncertainties. With the normal inverse-Gamma distribution as a bridge, the proposed framework realizes intra evidential fusion of multi-level predictions and inter evidential fusion between cost-volume-based and transformer-based stereo matching. Extensive experimental results show that the proposed framework exploits multi-view information effectively and achieves state-of-the-art overall performance both on accuracy and cross-domain generalization. The codes are available at https://github.com/jimmy19991222/ELFNet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 | D1 Error (All)9.61 | 118 | |
| Stereo Matching | KITTI 2012 | Error Rate (3px, Noc)8.67 | 81 | |
| Stereo Matching | ETH3D | Threshold Error > 1px (All)25.61 | 30 | |
| Stereo Matching | Booster Q (test) | Error Rate (> 2%)45.52 | 26 | |
| Stereo Matching | LayeredFlow E (test) | Error Rate (> 1%)93.08 | 13 | |
| Stereo Matching | Middlebury half-resolution 2014 v3 (test) | Bad Error Rate (All)24.48 | 11 | |
| Stereo Matching | Middlebury 2021 | Bad Pixel Rate (Thresh > 2.0, All)27.08 | 11 |