MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
About
We present MotionCrafter, a framework that leverages video generators to jointly reconstruct 4D geometry and estimate dense motion from a monocular video. The key idea is a joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, together with a 4D VAE tailored to learn this representation effectively. Unlike prior work that strictly aligns 3D values and latents with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and can hurt performance. Instead, we propose a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments on multiple datasets show that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Tracking | ADT | AJ44.6 | 14 | |
| World-centric geometry reconstruction | Dynamic Replica | delta^p99 | 13 | |
| Dense Tracking | Kubric | EPE4.6 | 11 | |
| 3D sparse tracking | Dynamic Replica (DR) | AJ49.26 | 9 | |
| 3D sparse tracking | Panoptic Studio | AJ50.44 | 9 | |
| 3D sparse tracking | Point Odyssey (PO) | AJ0.4197 | 9 | |
| 3D dense tracking | Kubric (test) | AJ21.76 | 9 | |
| Geometric Reconstruction | Monkaa (test) | Relp25.88 | 8 | |
| Geometric Reconstruction | Sintel (test) | Relp32.46 | 8 | |
| Geometric Reconstruction | DDAD (test) | Relp21.27 | 8 |