Magic3D: High-Resolution Text-to-3D Content Creation

About

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin• 2022

Related benchmarks

Task	Dataset	Result
Text-to-3D Generation	GPTEval3D 110 prompts	CP0.2	20
Text-to-3D Generation	GPTEval3D 110 prompts 1.0	GPTEval3D Alignment1.15e+3	20
Text-to-3D Generation	T³Bench Multiple Objects	Quality Score26.6	16
Text-to-3D Generation	MATE-3D	HyperScore Alignment5.46	15
Text-to-3D Generation	T3Bench (test)	Single Object Score37	14
Text-to-3D Generation	T³Bench Single Object with Surroundings	BRISQUE92.8	14
Text-to-3D Generation	T³Bench Single Object	Alignment Score35.3	11
Single-image normal estimation	Single-image normal estimation efficiency evaluation (test)	Params (M)73.5	10
Surface Normal Estimation	Surface Normal Estimation Benchmark	MAE19.2	10
Text-to-3D Generation	T3Bench frozen (300-prompt audit set)	CLIP Score18.26	10

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord