Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation

About

Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.

Rui Chen, Yongwei Chen, Ningxin Jiao, Kui Jia• 2023

Related benchmarks

Task	Dataset	Result
Text-to-3D Generation	GPTEval3D 110 prompts	CP0.22	20
Text-to-3D Generation	GPTEval3D 110 prompts 1.0	GPTEval3D Alignment1.07e+3	20
Text-to-3D Generation	T³Bench Multiple Objects	Quality Score22.7	16
Text-to-3D Generation	T³Bench Single Object with Surroundings	BRISQUE69.6	14
Text-to-3D Generation	T3Bench (test)	Single Object Score26.4	14
Text-to-3D Generation	Objaverse	CLIP Score0.207	12
Text-to-3D Generation	T³Bench Single Object	Alignment Score23.5	11
Text-to-3D Hand Generation	30 text prompts for hand generation	CLIP L1420.93	10
Text-to-3D Human Generation	Human Generation Benchmark Text-to-3D	CLIP L14 Score20.93	10
3D Human Generation	User Study 30 prompts	Q1 Best Preference Rate9.91	8

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord