Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

About

3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-3DToys4k
CLIP Score24.83
25
Novel View SynthesisGoogle Scanned Objects (GSO) (test)
PSNR16.677
24
Text-to-3D GenerationGPTEval3D 110 prompts 1.0
GPTEval3D Alignment1.09e+3
20
Single-view 3D ReconstructionGSO (test)
CD0.196
18
3D Asset ReconstructionToys4k
CD0.566
18
3D Shape ReconstructionOmniObject3D
CD0.114
17
3D ReconstructionGoogle Scanned Objects (GSO) (test)
LPIPS0.063
17
Novel View SynthesisObjaverse
PSNR14.81
17
3D Character GenerationAnime3D++ (test)
SSIM87.6
16
Image-to-3D GenerationGoogle Scanned Objects (GSO)
CLIP Similarity50.3
14
Showing 10 of 81 rows
...

Other info

Follow for update