HoloGAN: Unsupervised learning of 3D representations from natural images

About

We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3D representation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.

Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang• 2019

Related benchmarks

Task	Dataset	Result
Unconditional image synthesis	FFHQ 256x256 (test)	FID90.9	31
Image Generation	Faces	FID31	18
Image Generation	Cars	FID180	12
Unconditional image synthesis	AFHQ 256x256 (test)	FID95.6	12
Image Generation	CUB Birds 200-2011 (test)	FID78	9
Image Generation	Stanford Cars (test)	Fréchet Inception Distance134	9
Image Synthesis	Cats (test)	FID27	6
Unsupervised 3D-aware image synthesis	CelebA 128x128 (test)	FID39.7	5
3D-aware Image Synthesis	CARLA 64x64 resolution	FID134	5
3D-aware Image Synthesis	CARLA 128x128 resolution	FID67.5	5

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord