Scaling View Synthesis Transformers

About

Geometry-free view synthesis transformers have recently achieved state-of-the-art performance in Novel View Synthesis (NVS), outperforming traditional approaches that rely on explicit geometry modeling. Yet the factors governing their scaling with compute remain unclear. We present a systematic study of scaling laws for view synthesis transformers and derive design principles for training compute-optimal NVS models. Contrary to prior findings, we show that encoder-decoder architectures can be compute-optimal; we trace earlier negative results to suboptimal architectural choices and comparisons across unequal training compute budgets. Across several compute levels, we demonstrate that our encoder-decoder architecture, which we call the Scalable View Synthesis Model (SVSM), scales as effectively as decoder-only models, achieves a superior performance-compute Pareto frontier, and surpasses the previous state-of-the-art on real-world NVS benchmarks with substantially reduced training compute.

Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, Vincent Sitzmann• 2026

Related benchmarks

Task	Dataset	Result	Rank
View Synthesis	Re10K (test)	PSNR30.01		30
Novel View Synthesis	Novel View Synthesis Evaluation Set	PSNR30.01		2

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord