Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

About

We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page

Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng• 2026

Related benchmarks

TaskDatasetResultRank
Dense TrackingKubric
EPE4.6
11
Geometric ReconstructionMonkaa (test)
Relp25.88
8
Geometric ReconstructionSintel (test)
Relp32.46
8
Geometric ReconstructionDDAD (test)
Relp21.27
8
World-centric geometry reconstructionKubric
Rel^p3.4
7
World-centric geometry reconstructionDynamic Replica
Rel^p4.04
7
World-centric geometry reconstructionPoint Odyssey
Rel^p9.94
7
World-centric motion reconstructionvKITTI 2
EPE71.75
7
World-centric geometry reconstructionVKITTI2
Rel^p14.6
7
World-centric motion reconstructionSpring
Endpoint Error5.61
7
Showing 10 of 13 rows

Other info

GitHub

Follow for update