Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

About

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues. Following this design, our framework introduces novel cost volume fusion mechanisms that effectively handle critical challenges such as textureless regions, occlusions, and non-Lambertian surfaces. Through our novel optical illusion dataset, MonoTrap, and extensive evaluation across multiple benchmarks, we demonstrate that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions while showing remarkable robustness to challenging cases such as mirrors and transparencies.

Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia• 2024

Related benchmarks

TaskDatasetResultRank
Stereo MatchingKITTI 2015
D1 Error (All)3.93
118
Stereo MatchingKITTI 2012
Error Rate (3px, Noc)3.52
81
Stereo MatchingETH3D
Threshold Error > 1px (All)1.66
30
Stereo MatchingBooster Q (test)
Error Rate (> 2%)6.52
26
Stereo Depth EstimationSQUID zero-shot
Relative Error (Rel)0.0952
16
Stereo MatchingLayeredFlow E (test)
Error Rate (> 1%)51.24
13
Stereo Depth EstimationTartanAir underwater (test)
Relative Error (Rel)0.0592
13
Stereo MatchingMiddlebury half-resolution 2014 v3 (test)
Bad Error Rate (All)6.96
11
Stereo MatchingMiddlebury 2021
Bad Pixel Rate (Thresh > 2.0, All)7.97
11
DSM ReconstructionOmaha Synchronic DFC2019
Altitude MAE (m)1.04
8
Showing 10 of 15 rows

Other info

Code

Follow for update