MVDream: Multi-view Diffusion for 3D Generation

About

We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.

Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao Yang• 2023

Related benchmarks

Task	Dataset	Result
Text-to-3D Generation	GPTEval3D 110 prompts 1.0	GPTEval3D Alignment1.27e+3	20
Text-to-3D Generation	GPTEval3D 110 prompts	CP0.23	20
Text-to-3D Generation	Objaverse	CLIP Score0.262	12
Text-to-3D Generation	GPTEval3D 60 prompts	Proportion52	10
Text-to-3D Generation	GPTEval3D 30 evaluation prompts	CP0.27	10
Text-to-3D Generation	113 text-to-3D prompt objects (test)	Geometry CLIP Score24.8003	8
3D Material Refinement Preference	Objaverse	GPT Evaluation Score44.1	8
Text-to-Apparel Generation	30x5 custom apparel descriptions 1.0 (test)	BLIP-VQA0.7	8
Multi-view consistency	DreamFusion 414 text prompts (test)	Avg MRC12.22	7
Multi-View Reconstruction	DreamFusion (test)	Avg MRC0.1222	7

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord