MVDream: Multi-view Diffusion for 3D Generation
About
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao Yang• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-3D Generation | GPTEval3D 110 prompts 1.0 | GPTEval3D Alignment1.27e+3 | 20 | |
| Text-to-3D Generation | Objaverse | CLIP Score0.262 | 12 | |
| Text-to-3D Generation | 113 text-to-3D prompt objects (test) | Geometry CLIP Score24.8003 | 8 | |
| 3D Material Refinement Preference | Objaverse | GPT Evaluation Score44.1 | 8 | |
| Text-to-Apparel Generation | 30x5 custom apparel descriptions 1.0 (test) | BLIP-VQA0.7 | 8 | |
| Multi-View Reconstruction | DreamFusion (test) | Avg MRC0.1222 | 7 | |
| Text-to-Hair Generation | Hair Generation Prompts (test) | BLIP-VQA90 | 7 | |
| Text-to-3D Generation | COCO (val) | FID133.1 | 7 | |
| Text-to-Hair Generation | Prompt List quantitative experiments | FID215.1 | 7 | |
| Text-to-3D Generation | 30 multi-object scenes | CLIP R1-Precision89.2 | 5 |
Showing 10 of 17 rows