NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

About

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes or monocular videos of dynamic scenes. Specifically, built upon our theoretical modeling, we iteratively modulate the score function with the given scene priors represented with warped input views to control the video diffusion process. Moreover, by theoretically exploring the boundary of the estimation error, we achieve the modulation in an adaptive fashion according to the view pose and the number of diffusion steps. Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our NVS-Solver over state-of-the-art methods both quantitatively and qualitatively. \textit{ Source code in } \href{https://github.com/ZHU-Zhiyu/NVS_Solver}{https://github.com/ZHU-Zhiyu/NVS$\_$Solver}.

Meng You, Zhiyu Zhu, Hui Liu, Junhui Hou• 2024

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	LLFF	PSNR11.99	144
Novel View Synthesis	Mip-NeRF 360	PSNR12.45	143
Stereo Image Conversion	Marvel-10K	PSNR31.18	14
Novel View Synthesis	Tanks&Temples	PSNR12.356	10
Dynamic Monocular Video Novel View Synthesis	DAVIS (test)	FID (192x192)2.654	9
Novel View Synthesis	LLFF	PSNR11.418	9
Novel View Synthesis	MipNeRF360 K=9 view-count split	H Score8	8
Novel View Synthesis	MipNeRF360 K=3 view-count	H7	8
Novel View Synthesis	MipNeRF360 K=6 view-count	H7	8
Novel View Synthesis (3D Consistency Evaluation)	DL3DV K=3 views	H (Human Ranking)7	8

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord