Robust Synthetic-to-Real Transfer for Stereo Matching

About

With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.

Jiawei Zhang, Jiahe Li, Lei Huang, Xiaohan Yu, Lin Gu, Jin Zheng, Xiao Bai• 2024

Related benchmarks

Task	Dataset	Result
Stereo Matching	KITTI 2015 (test)	--	233
Stereo Matching	KITTI 2015	D1 Error (All)1.72	118
Stereo Matching	KITTI 2012	--	108
Stereo Matching	KITTI 2012 (test)	--	105
Stereo Matching	Middlebury (test)	--	60
Stereo Matching	ETH3D	bad 1.02.28	57
Stereo Matching	Middlebury	Bad Pixel Rate (Thresh 2.0)7.51	53
Stereo Matching	ETH3D (test)	Error Rate (Th=1.0)1.81	34
Stereo Matching	Booster Q (test)	Error Rate (> 2%)10.32	26
Stereo Matching	DrivingStereo	Error Rate (Sunny)1.85	14

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord