Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

About

We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. In this challenging regime, 3D scene points are regularly observed only once, requiring prior-based reconstruction of scene geometry and appearance. We find that existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry and due to the high cost of differentiable rendering that precludes their scaling to large-scale training. We take a step towards resolving these shortcomings by formulating a multi-view transformer encoder, proposing an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray, and a lightweight cross-attention-based renderer. Our contributions enable training of our method on a large-scale real-world dataset of indoor and outdoor scenes. We demonstrate that our method learns powerful multi-view geometry priors while reducing the rendering time. We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann• 2023

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisMip-NeRF 360 (test)
PSNR14
166
Novel View SynthesisRealEstate10K
PSNR24.78
116
Novel View SynthesisDTU (test)
PSNR11.35
82
Novel View SynthesisACID
PSNR26.88
51
Novel View SynthesisRealEstate-10K 2-view
PSNR24.78
28
Novel View SynthesisACID (test)
PSNR26.88
18
Novel View SynthesisRealEstate 10k (RE10k) (test)
PSNR24.78
16
Scene-level View SynthesisRealEstate10k (val)
PSNR24.78
15
Novel View ReconstructionRE10K
PSNR24.78
12
Novel View SynthesisRE10K Large
PSNR25.897
12
Showing 10 of 26 rows

Other info

Follow for update