MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

About

We introduce 4D Motion Scaffolds (MoSca), a modern 4D reconstruction system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models and lift the video data to a novel Motion Scaffold (MoSca) representation, which compactly and smoothly encodes the underlying motions/deformations. The scene geometry and appearance are then disentangled from the deformation field and are encoded by globally fusing the Gaussians anchored onto the MoSca and optimized via Gaussian Splatting. Additionally, camera focal length and poses can be solved using bundle adjustment without the need of any other pose estimation tools. Experiments demonstrate state-of-the-art performance on dynamic rendering benchmarks and its effectiveness on real videos.

Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis• 2024

Related benchmarks

Task	Dataset	Result
Point Cloud Reconstruction	DyCheck	Accuracy (Mean)0.732	40
Novel View Synthesis	iPhone dataset	SSIM0.706	33
Novel View Synthesis	iPhone DyCheck 7 scenes 2x resolution	mPSNR19.32	31
Novel View Synthesis	NVIDIA (test)	PSNR26.72	29
4D Reconstruction	DyCheck (test)	mPSNR19.54	21
Camera pose estimation	DyCheck	ATE0.024	21
Dynamic Novel View Synthesis	DyCheck 5 scenes 1.0	mPSNR18.4	20
Novel View Synthesis	NVIDIA	PSNR26.76	20
Camera pose estimation	TUM-dynamics (test)	ATE0.031	18
Dynamic Scene Novel View Synthesis	NVIDIA video dataset average over all scenes 112	PSNR26.72	17

Showing 10 of 58 rows

Other info

Code

Follow for update

@wizwand_team Discord