Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

About

We present MORPHOS, a novel autoregressive framework that generates dynamic 3D assets from videos across diverse representations, including meshes, 3D Gaussians, and radiance fields. Existing methods are typically limited to a single representation, struggle to model topological changes, or fail to maintain temporal consistency over long videos. To address these limitations, we introduce the Temporal Structured Latents (T-SLAT), a unified 4D representation that jointly encodes geometry and appearance along the temporal dimension. Leveraging T-SLAT, MORPHOS autoregressively generates dynamic 3D assets via causal attention, conditioning each frame on its preceding history to ensure temporal consistency while handling evolving topologies. We also propose a temporal-structural augmentation to mitigate error accumulation in autoregressive generation. MORPHOS achieves state-of-the-art performance in appearance and competitive results in geometry across multiple benchmarks, demonstrating superior generalization across various representations and robustness in long-horizon generation.

Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim, JongMin Lee, Seungryong Kim• 2026

Related benchmarks

TaskDatasetResultRank
4D GenerationConsistent4D
LPIPS0.1531
40
4D GenerationMotion80 Short
LPIPS0.1505
6
4D GenerationMotion80 (Long)
LPIPS0.1494
6
Dynamic 3D GenerationActionBench
LPIPS0.1904
6
Showing 4 of 4 rows

Other info

Follow for update