Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-1-to-3: Zero-shot One Image to 3D Object

About

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which allow new images to be generated of the same object under a specified camera transformation. Even though it is trained on a synthetic dataset, our model retains a strong zero-shot generalization ability to out-of-distribution datasets as well as in-the-wild images, including impressionist paintings. Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick• 2023

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisTHuman 2.0 (test)
LPIPS0.1163
39
3D ReconstructionGoogle Scanned Objects (GSO) (test)
LPIPS0.23
17
Novel View SynthesisGoogle Scanned Objects
PSNR18.51
15
Novel View SynthesisGoogle Scanned Objects (GSO) (test)
PSNR18.93
14
Novel View SynthesisObjaverse (test)
PSNR17.37
14
Novel View SynthesisInterHand2.6M (test)
LPIPS0.17
12
Novel View SynthesisGSO challenging
PSNR21.79
10
2D Multi-view GenerationAnime3D++ (test)
SSIM0.865
10
Multi-view GenerationGSO
PSNR18.8219
9
Multi-view Generation3D-FUTURE
PSNR17.0526
9
Showing 10 of 56 rows

Other info

Follow for update