LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

About

We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods -- from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps) -- addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs). Please see our website for more details: https://haian-jin.github.io/projects/LVSM/ .

Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, Zexiang Xu• 2024

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	Tanks&Temples (test)	--	289
Novel View Synthesis	RealEstate10K	PSNR29.67	178
Novel View Synthesis	RE10K	SSIM72.9	161
Novel View Synthesis	DL3DV (test)	PSNR19.855	83
Novel View Synthesis	Mip-NeRF360 (test)	PSNR20.09	80
Novel View Synthesis	Re10K (test)	PSNR22.91	79
Novel View Synthesis	DL3DV	PSNR21.73	75
Novel View Synthesis	ScanNet++	PSNR20.25	74
Novel View Synthesis	DL3DV 6view	PSNR17.09	34
Novel View Synthesis	RealEstate-10K 2-view	PSNR29.67	32

Showing 10 of 76 rows

...

Other info

Follow for update

@wizwand_team Discord