Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding

About

The growing demand for immersive 3D content calls for automated monocular-to-stereo video conversion. We present Elastic3D, a controllable, direct end-to-end method for upgrading a conventional video to a binocular one. Our approach, based on (conditional) latent diffusion, avoids artifacts due to explicit depth estimation and warping. The key to its high-quality stereo video output is a novel, guided VAE decoder that ensures sharp and epipolar-consistent stereo video output. Moreover, our method gives the user control over the strength of the stereo effect (more precisely, the disparity range) at inference time, via an intuitive, scalar tuning knob. Experiments on three different datasets of real-world stereo videos show that our method outperforms both traditional warping-based and recent warping-free baselines and sets a new standard for reliable, controllable stereo video conversion. Please check the project page for the video samples https://elastic3d.github.io.

Nando Metzger, Prune Truong, Goutam Bhat, Konrad Schindler, Federico Tombari• 2025

Related benchmarks

TaskDatasetResultRank
Stereoscopic Video GenerationStereo4D (test)
iSQoE0.515
7
Mono-to-stereo video conversionStereo4D (test)
PSNR26.1
6
Stereoscopic Video GenerationAVP (test)
iSQoE0.509
6
Stereoscopic Video GenerationiPhone (test)
iSQoE0.506
6
Mono-to-stereo video conversionApple Vision Pro Spatial Video (out-of-distribution)
PSNR25.9
5
Mono-to-stereo video conversionEgo4D (test)
PSNR19.8
5
Monocular to Binocular Stereo Video ConversionSpatial Video dataset iPhone portion (test)
PSNR22.5
5
3D Video GenerationiPhone and Apple Vision Pro (AVP) datasets
Equal Preference Count45
4
Showing 8 of 8 rows

Other info

Follow for update