Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

About

We present Motion 3-to-4, a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video and an optional 3D reference mesh. While recent advances have significantly improved 2D, video, and 3D content generation, 4D synthesis remains difficult due to limited training data and the inherent ambiguity of recovering geometry and motion from a monocular viewpoint. Motion 3-to-4 addresses these challenges by decomposing 4D synthesis into static 3D shape generation and motion reconstruction. Using a canonical reference mesh, our model learns a compact motion latent representation and predicts per-frame vertex trajectories to recover complete, temporally coherent geometry. A scalable frame-wise transformer further enables robustness to varying sequence lengths. Evaluations on both standard benchmarks and a new dataset with accurate ground-truth geometry show that Motion 3-to-4 delivers superior fidelity and spatial consistency compared to prior work. Project page is available at https://motion3-to-4.github.io/.

Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, Anpei Chen• 2026

Related benchmarks

Task	Dataset	Result
4D Generation	Consistent4D	LPIPS0.2044	40
4D Synthesis	Monocular Video	FPS6.5	8
4D Mesh Reconstruction	TexVerse (test)	CD-3D0.056	6
Video-to-4D generation	Helix4DBench 1.0 (test)	ULIP-20.4331	6
4D Mesh Reconstruction	ActionBench (test)	CD-3D0.068	6
4D Generation	Motion80 (Long)	LPIPS0.2347	6
4D Generation	Motion80 Short	LPIPS0.2118	6
Dynamic 3D Generation	ActionBench	LPIPS0.2025	6
4D Motion Modeling	Motion-80 Short Sequence	CD0.0437	5
4D Motion Modeling	Motion-80 Long Sequence	CD0.0929	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord