Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scaling View Synthesis Transformers

About

Geometry-free view synthesis transformers have recently achieved state-of-the-art performance in Novel View Synthesis (NVS), outperforming traditional approaches that rely on explicit geometry modeling. Yet the factors governing their scaling with compute remain unclear. We present a systematic study of scaling laws for view synthesis transformers and derive design principles for training compute-optimal NVS models. Contrary to prior findings, we show that encoder-decoder architectures can be compute-optimal; we trace earlier negative results to suboptimal architectural choices and comparisons across unequal training compute budgets. Across several compute levels, we demonstrate that our encoder-decoder architecture, which we call the Scalable View Synthesis Model (SVSM), scales as effectively as decoder-only models, achieves a superior performance-compute Pareto frontier, and surpasses the previous state-of-the-art on real-world NVS benchmarks with substantially reduced training compute.

Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, Vincent Sitzmann• 2026

Related benchmarks

TaskDatasetResultRank
View SynthesisRe10K (test)
PSNR30.01
23
Novel View SynthesisNovel View Synthesis Evaluation Set
PSNR30.01
2
Showing 2 of 2 rows

Other info

Follow for update