Robust Synthetic-to-Real Transfer for Stereo Matching
About
With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 (test) | -- | 144 | |
| Stereo Matching | KITTI 2015 | D1 Error (All)1.72 | 118 | |
| Stereo Matching | KITTI 2012 | -- | 81 | |
| Stereo Matching | KITTI 2012 (test) | -- | 76 | |
| Stereo Matching | ETH3D | bad 1.02.28 | 51 | |
| Stereo Matching | Middlebury (test) | -- | 47 | |
| Stereo Matching | Middlebury | Bad Pixel Rate (Thresh 2.0)7.51 | 34 | |
| Stereo Matching | ETH3D (test) | Error Rate (Th=1.0)1.81 | 30 | |
| Stereo Matching | Booster Q (test) | Error Rate (> 2%)10.32 | 26 | |
| Stereo Matching | DrivingStereo | Error Rate (Sunny)1.85 | 14 |