Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

About

3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-3D GenerationGPTEval3D 110 prompts 1.0
GPTEval3D Alignment1.09e+3
20
3D Shape ReconstructionOmniObject3D
CD0.114
17
3D ReconstructionGoogle Scanned Objects (GSO) (test)
LPIPS0.063
17
3D Character GenerationAnime3D++ (test)
SSIM87.6
16
Text-to-3DToys4k
CLIP Score24.83
14
Single-view 3D ReconstructionGSO (test)
CD0.196
13
Text-to-3D GenerationObjaverse
CLIP Score30.06
12
Image-to-3D GenerationNeRF4
CLIP-Similarity0.48
12
3D Asset ReconstructionToys4k
CD0.566
11
Image-conditioned 3D GenerationObjaverse (test)
FID19.93
10
Showing 10 of 58 rows

Other info

Follow for update