Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

About

We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models, thus generating those too. We fit a diffusion model to a large number of viewsets for a given category of objects. The resulting generator can be conditioned on zero, one or more input views. Conditioned on a single view, it performs 3D reconstruction accounting for the ambiguity of the task and allowing to sample multiple solutions compatible with the input. The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset. Project page: szymanowiczs.github.io/viewset-diffusion.

Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi• 2023

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	ShapeNet cars category	PSNR23.29	20
Single-view 3D Reconstruction	SRN Cars (test)	PSNR22.688	9
Image-to-3D Generation	Synthetic 3D Objects (test)	Ewarp0.0021	6
Single-image reconstruction	CO3D v2 (test)	PSNR (Teddybear)19.68	3
Unconditional Generation	CO3D Teddybear v2 (test)	FID201.7	3
Unconditional Generation	CO3D Hydrant v2 (test)	FID138.4	3
Unconditional Generation	CO3D Donut v2 (test)	FID199.1	3
Unconditional Generation	CO3D Apple v2 (test)	FID183.7	3
Single-view Reconstruction	CO3D Hydrant (held-out target view)	PSNR19.664	2
Single-view Reconstruction	CO3D Teddybear (held-out target view)	PSNR15.473	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord