Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis

About

The rising popularity of immersive visual experiences has increased interest in stereoscopic 3D video generation. Despite significant advances in video synthesis, creating 3D videos remains challenging due to the relative scarcity of 3D video data. We propose a simple approach for transforming a text-to-video generator into a video-to-stereo generator. Given an input video, our framework automatically produces the video frames from a shifted viewpoint, enabling a compelling 3D effect. Prior and concurrent approaches for this task typically operate in multiple phases, first estimating video disparity or depth, then warping the video accordingly to produce a second view, and finally inpainting the disoccluded regions. This approach inherently fails when the scene involves specular surfaces or transparent objects. In such cases, single-layer disparity estimation is insufficient, resulting in artifacts and incorrect pixel shifts during warping. Our work bypasses these restrictions by directly synthesizing the new viewpoint, avoiding any intermediate steps. This is achieved by leveraging a pre-trained video model's priors on geometry, object materials, optics, and semantics, without relying on external geometry models or manually disentangling geometry from the synthesis process. We demonstrate the advantages of our approach in complex, real-world scenarios featuring diverse object materials and compositions. See videos on https://video-eye2eye.github.io

Michal Geyer, Omer Tov, Linyi Jin, Richard Tucker, Inbar Mosseri, Tali Dekel, Noah Snavely• 2025

Related benchmarks

TaskDatasetResultRank
Stereoscopic Video GenerationStereo4D (test)
iSQoE0.517
7
Stereoscopic Video GenerationiPhone (test)
iSQoE0.507
6
Stereoscopic Video GenerationAVP (test)
iSQoE0.507
6
Mono-to-stereo video conversionStereo4D (test)
PSNR21.1
6
Mono-to-stereo video conversionApple Vision Pro Spatial Video (out-of-distribution)
PSNR20.6
5
Monocular to Binocular Stereo Video ConversionSpatial Video dataset iPhone portion (test)
PSNR20.2
5
3D Video GenerationiPhone and Apple Vision Pro (AVP) datasets
Equal Preference Count45
4
Showing 7 of 7 rows

Other info

Follow for update