Shap-E: Generating Conditional 3D Implicit Functions

About

We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at https://github.com/openai/shap-e.

Heewoo Jun, Alex Nichol• 2023

Related benchmarks

Task	Dataset	Result
Text-to-3D	Toys4k	CLIP Score25.12	25
Text-to-3D Generation	GPTEval3D 110 prompts 1.0	GPTEval3D Alignment842.8	20
Single-view 3D Reconstruction	GSO (test)	CD0.204	18
3D Asset Reconstruction	Toys4k	CD0.6724	18
3D Shape Reconstruction	OmniObject3D	CD0.434	17
Text-to-3D Generation	Objaverse	CLIP Score30.52	12
Image-to-3D	Toys4k	FD (Inception)34.72	11
Image-to-3D	Toys4k	CLIP Similarity82.15	11
3D Shape Reconstruction	Pix3D	FS@10.2016	10
Image-conditioned 3D Generation	Objaverse (test)	FID138.5	10

Showing 10 of 39 rows

Other info

Follow for update

@wizwand_team Discord