DreamFusion: Text-to-3D using 2D Diffusion

About

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall• 2022

Related benchmarks

Task	Dataset	Result
Text-to-3D Generation	GPTEval3D 110 prompts	CP0.22	20
Text-to-3D Generation	GPTEval3D 110 prompts 1.0	GPTEval3D Alignment1.00e+3	20
Text-to-3D Generation	T³Bench Multiple Objects	Quality Score17.3	16
Text-to-3D Generation	MATE-3D	HyperScore Alignment4.6	15
Text-to-3D Generation	T³Bench Single Object with Surroundings	BRISQUE90.2	14
Text-to-3D Generation	T3Bench (test)	Single Object Score24.4	14
System Identification	Synthetic dataset	Rel Error (delta_mu)0.005	12
Text-to-3D Generation	Objaverse	CLIP Score0.245	12
Text-to-3D Generation	T³Bench Single Object	Alignment Score24	11
Text-to-3D Generation	GPTEval3D 60 prompts	Proportion61	10

Showing 10 of 51 rows

Other info

Follow for update

@wizwand_team Discord