Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

About

While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https://github.com/openai/point-e.

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen• 2022

Related benchmarks

TaskDatasetResultRank
3D ReconstructionShapeNet (test)--
74
Text-to-3D GenerationGPTEval3D 110 prompts 1.0
GPTEval3D Alignment725.2
20
2D-to-3D ReconstructionShapeNet 1 (test)
Chamfer Distance22.93
18
3D Shape ReconstructionOmniObject3D
CD0.448
17
Image-to-3D GenerationNeRF4
CLIP-Similarity0.48
12
Text-to-3D GenerationObjaverse
CLIP Score0.22
12
3D Shape ReconstructionPix3D
FS@10.1779
10
3D ReconstructionGSO 13 (test)
Chamfer Distance0.0426
8
3D ReconstructionGoogle Scanned Objects (GSO) 30 instances
Chamfer Distance0.043
8
Single-view 3D ReconstructionGoogle Scanned Objects (GSO) 13
Chamfer Distance0.0426
8
Showing 10 of 21 rows

Other info

Follow for update