Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

About

We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.

Tjark Behrens, Anton Obukhov, Bingxin Ke, Fabio Tosi, Matteo Poggi, Konrad Schindler• 2025

Related benchmarks

TaskDatasetResultRank
Stereo View SynthesisMiddlebury 2014 (full)
iSQoE0.6829
5
Stereo View SynthesisDrivingstereo (full)
iSQoE78.29
5
Stereo View SynthesisBooster (test)
iSQoE67.64
5
Stereo View SynthesisLayeredFlow (test)
iSQoE0.7489
5
Showing 4 of 4 rows

Other info

GitHub

Follow for update