Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

About

We present MotionCrafter, a framework that leverages video generators to jointly reconstruct 4D geometry and estimate dense motion from a monocular video. The key idea is a joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, together with a 4D VAE tailored to learn this representation effectively. Unlike prior work that strictly aligns 3D values and latents with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and can hurt performance. Instead, we propose a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments on multiple datasets show that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page

Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng• 2026

Related benchmarks

TaskDatasetResultRank
3D TrackingADT
AJ44.6
14
World-centric geometry reconstructionDynamic Replica
delta^p99
13
Dense TrackingKubric
EPE4.6
11
3D sparse trackingDynamic Replica (DR)
AJ49.26
9
3D sparse trackingPanoptic Studio
AJ50.44
9
3D sparse trackingPoint Odyssey (PO)
AJ0.4197
9
3D dense trackingKubric (test)
AJ21.76
9
Geometric ReconstructionMonkaa (test)
Relp25.88
8
Geometric ReconstructionSintel (test)
Relp32.46
8
Geometric ReconstructionDDAD (test)
Relp21.27
8
Showing 10 of 22 rows

Other info

GitHub

Follow for update