Zero-1-to-3: Zero-shot One Image to 3D Object

About

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which allow new images to be generated of the same object under a specified camera transformation. Even though it is trained on a synthetic dataset, our model retains a strong zero-shot generalization ability to out-of-distribution datasets as well as in-the-wild images, including impressionist paintings. Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick• 2023

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	THuman 2.0 (test)	LPIPS0.1163	51
Novel View Synthesis	GSO	PSNR18.93	25
Novel View Synthesis	Google Scanned Objects (GSO) (test)	PSNR18.93	24
3D Reconstruction	Google Scanned Objects (GSO) (test)	LPIPS0.23	17
Novel View Synthesis	Objaverse	PSNR12.77	17
Novel View Synthesis	Google Scanned Objects	PSNR18.51	15
Novel View Synthesis	Objaverse (test)	PSNR17.37	14
Novel View Synthesis	InterHand2.6M (test)	LPIPS0.17	12
Novel View Synthesis	GSO-30	PSNR18.51	11
Novel View Synthesis	GSO challenging	PSNR21.79	10

Showing 10 of 68 rows

Other info

Follow for update

@wizwand_team Discord