MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
About
We present MoVieS, a Motion-aware View Synthesis model that reconstructs 4D dynamic scenes from monocular videos in one second. It represents dynamic 3D scenes with pixel-aligned Gaussian primitives and explicitly supervises their time-varying motions. This allows, for the first time, the unified modeling of appearance, geometry and motion from monocular videos, and enables reconstruction, view synthesis and 3D point tracking within a single learning-based framework. By bridging view synthesis with geometry reconstruction, MoVieS enables large-scale training on diverse datasets with minimal dependence on task-specific supervision. As a result, it also naturally supports a wide range of zero-shot applications, such as scene flow estimation and moving object segmentation. Extensive experiments validate the effectiveness and efficiency of MoVieS across multiple tasks, achieving competitive performance while offering several orders of magnitude speedups.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | DyCheck (test) | mPSNR18.46 | 15 | |
| Novel View Synthesis | NVIDIA dataset (test) | Mean PSNR19.16 | 9 | |
| 3D Point Tracking | Aria Digital Twin | EPE (3D)0.2153 | 4 | |
| 3D Point Tracking | DriveTrack | EPE_3D0.0472 | 4 | |
| 3D Point Tracking | Panoptic Studio | EPE_3D0.0352 | 4 |