Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

About

Stereo matching is a key technique for metric depth estimation in computer vision and robotics. Real-world challenges like occlusion and non-texture hinder accurate disparity estimation from binocular matching cues. Recently, monocular relative depth estimation has shown remarkable generalization using vision foundation models. Thus, to facilitate robust stereo matching with monocular depth cues, we incorporate a robust monocular relative depth model into the recurrent stereo-matching framework, building a new framework for depth foundation model-based stereo-matching, DEFOM-Stereo. In the feature extraction stage, we construct the combined context and matching feature encoder by integrating features from conventional CNNs and DEFOM. In the update stage, we use the depth predicted by DEFOM to initialize the recurrent disparity and introduce a scale update module to refine the disparity at the correct scale. DEFOM-Stereo is verified to have much stronger zero-shot generalization compared with SOTA methods. Moreover, DEFOM-Stereo achieves top performance on the KITTI 2012, KITTI 2015, Middlebury, and ETH3D benchmarks, ranking $1^{st}$ on many metrics. In the joint evaluation under the robust vision challenge, our model simultaneously outperforms previous models on the individual benchmarks, further demonstrating its outstanding capabilities.

Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, Rui Huang• 2025

Related benchmarks

TaskDatasetResultRank
Stereo MatchingKITTI 2015
D1 Error (All)1.33
118
Stereo MatchingKITTI 2012
Error Rate (3px, Noc)0.94
81
Stereo MatchingKITTI 2012 (test)--
76
Stereo MatchingETH3D
bad 1.00.78
51
Stereo MatchingScene Flow
EPE (px)0.42
40
Stereo MatchingKITTI 2015 (all pixels)
D1 Error (Background)1.25
38
Stereo MatchingMiddlebury
Bad Pixel Rate (Thresh 2.0)5.02
34
Stereo MatchingETH3D
Threshold Error > 1px (All)0.78
30
Stereo MatchingKITTI 2012 (Noc)
Error Rate (>2px)1.43
26
Stereo MatchingKITTI 2012 (All split)
Error Rate (>2px)1.79
26
Showing 10 of 24 rows

Other info

Code

Follow for update