Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

About

The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications.

Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, Ziwei Liu• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-3DToys4k
CLIP Score22.48
14
3D Asset ReconstructionToys4k
CD0.0128
11
3D GenerationImageNet
CLIP Score0.646
9
3D GenerationReal 3D Datasets GSO, Omni3D, DTC
CLIP0.831
9
Image-to-3DToys4k
FD (Inception)37.68
8
Text-to-3DUser Study 68 text-to-3D cases Human Evaluation
Selection Count5
8
Text-to-3D GenerationText-to-3D evaluation prompts
CLIP Score24.33
7
Image-to-3DUser Study 67 image-to-3D cases (Human Evaluation)
Selection Count5
7
Text-to-3D GenerationMME-3DR
CLIP Score15.9
6
Image-to-3DGSO 300 random samples (test)
KID1.38
5
Showing 10 of 11 rows

Other info

Code

Follow for update