Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion

About

Despite recent advances in Novel View Synthesis (NVS), generating high-fidelity views from single or sparse observations remains a significant challenge. Existing splatting-based approaches often produce distorted geometry due to splatting errors. While diffusion-based methods leverage rich 3D priors to achieve improved geometry, they often suffer from texture hallucination. In this paper, we introduce SplatDiff, a pixel-splatting-guided video diffusion model designed to synthesize high-fidelity novel views from a single image. Specifically, we propose an aligned synthesis strategy for precise control of target viewpoints and geometry-consistent view synthesis. To mitigate texture hallucination, we design a texture bridge module that enables high-fidelity texture generation through adaptive feature fusion. In this manner, SplatDiff leverages the strengths of splatting and diffusion to generate novel views with consistent geometry and high-fidelity details. Extensive experiments verify the state-of-the-art performance of SplatDiff in single-view NVS. Additionally, without extra training, SplatDiff shows remarkable zero-shot performance across diverse tasks, including sparse-view NVS and stereo video conversion.

Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers• 2025

Related benchmarks

TaskDatasetResultRank
Stereo Image ConversionMarvel-10K
PSNR36.23
14
Stereo ConversionMono2Stereo
PSNR32.37
14
Stereo Video ConversionMarvel-10K
PSNR36.24
8
Stereo Image ConversionMono2Stereo (test)
S-PSNR24.78
6
Stereo Video ConversionMarvel-10K (test)
S-PSNR26.57
6
Novel View SynthesisAIM-500 (test)
FID19.26
5
Novel View SynthesisP3M-10K (test)
FID21.61
5
Showing 7 of 7 rows

Other info

Follow for update