Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

About

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues. Following this design, our framework introduces novel cost volume fusion mechanisms that effectively handle critical challenges such as textureless regions, occlusions, and non-Lambertian surfaces. Through our novel optical illusion dataset, MonoTrap, and extensive evaluation across multiple benchmarks, we demonstrate that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions while showing remarkable robustness to challenging cases such as mirrors and transparencies.

Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia• 2024

Related benchmarks

Task	Dataset	Result
Stereo Matching	KITTI 2015	D1 Error (All)3.93	118
Stereo Matching	KITTI 2012	Error Rate (3px, All)3.9	108
Stereo Matching	ETH3D	Threshold Error > 1px (Noc)1.43	50
Stereo Matching	Booster Q	EPE1.21	33
Stereo Matching	Booster Q (test)	Error Rate (> 2%)6.52	26
Stereo Matching	Middlebury 2021	Bad Pixel Rate (Thresh > 2.0, All)7.97	24
Stereo Depth Estimation	SQUID zero-shot	Relative Error (Rel)0.0952	16
Stereo Depth Estimation	Booster All type	EPE2.77	14
Stereo Matching	LayeredFlow E (test)	Error Rate (> 1%)51.24	13
Stereo Depth Estimation	TartanAir underwater (test)	Relative Error (Rel)0.0592	13

Showing 10 of 25 rows

Other info

Code

Follow for update

@wizwand_team Discord