An Improved RaftStereo Trained with A Mixed Dataset for the Robust Vision Challenge 2022

About

Stereo-matching is a fundamental problem in computer vision. Despite recent progress by deep learning, improving the robustness is ineluctable when deploying stereo-matching models to real-world applications. Different from the common practices, i.e., developing an elaborate model to achieve robustness, we argue that collecting multiple available datasets for training is a cheaper way to increase generalization ability. Specifically, this report presents an improved RaftStereo trained with a mixed dataset of seven public datasets for the robust vision challenge (denoted as iRaftStereo_RVC). When evaluated on the training sets of Middlebury, KITTI-2015, and ETH3D, the model outperforms its counterparts trained with only one dataset, such as the popular Sceneflow. After fine-tuning the pre-trained model on the three datasets of the challenge, it ranks at 2nd place on the stereo leaderboard, demonstrating the benefits of mixed dataset pre-training.

Hualie Jiang, Rui Xu, Wenjie Jiang• 2022

Related benchmarks

Task	Dataset	Result
Stereo Matching	ETH3D (non-occluded)	Bad 1.0 Error1.62	52
Stereo Matching	KITTI 2015 (all pixels)	D1 Error (Background)1.88	48
Stereo Matching	KITTI Noc 2015	D1 Error (Background)1.76	42
Stereo Matching	Middlebury v3	Bad Pixel Rate (Thresh 2.0)13.3	35
Stereo Matching	KITTI 2015 (non-occluded)	D1 Error (Background)1.76	25
Stereo Matching	Middlebury non-occluded	Bad Pixel Rate (2.0)8.07	20
Stereo Matching	ETH3D (All)	D1 Error1.88	19
Stereo Matching	ETH3D RVC (all)	Bad 1.0 Error1.88	9
Stereo Matching	KITTI RVC 2015 (all)	D1 Error (bg)1.88	9

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord