Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

About

Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han• 2026

Related benchmarks

Task	Dataset	Result	Rank
4D Generation	Objaverse holdout (test)	Chamfer Distance0.1052		6
4D Generation	4D Generation Sequences	LPIPS0.094		5

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord