Variational Lossy Autoencoder

About

Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution $p(z)$ and decoding distribution $p(x|z)$, we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art results on MNIST, OMNIGLOT and Caltech-101 Silhouettes density estimation tasks.

Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel• 2016

Related benchmarks

Task	Dataset	Result
Density Estimation	CIFAR-10 (test)	Bits/dim2.95	134
Generative Modeling	CIFAR-10 (test)	NLL (bits/dim)2.95	62
Log-likelihood estimation	MNIST dynamically binarized (test)	Log-Likelihood78.53	48
Generative Modeling	CIFAR-10	BPD2.95	46
Density Estimation	binarized MNIST 28x28 (test)	Test LogL79.03	44
Image Modeling	Omniglot (test)	NLL89.83	27
Density Estimation	OMNIGLOT dynamically binarized (test)	NLL89.83	16
Generative Modeling	MNIST	NLL (nats)78.53	13

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord