Variational Lossy Autoencoder
About
Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution $p(z)$ and decoding distribution $p(x|z)$, we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art results on MNIST, OMNIGLOT and Caltech-101 Silhouettes density estimation tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Density Estimation | CIFAR-10 (test) | Bits/dim2.95 | 134 | |
| Generative Modeling | CIFAR-10 (test) | NLL (bits/dim)2.95 | 62 | |
| Log-likelihood estimation | MNIST dynamically binarized (test) | Log-Likelihood78.53 | 48 | |
| Generative Modeling | CIFAR-10 | BPD2.95 | 46 | |
| Density Estimation | binarized MNIST 28x28 (test) | Test LogL79.03 | 44 | |
| Image Modeling | Omniglot (test) | NLL89.83 | 27 | |
| Density Estimation | OMNIGLOT dynamically binarized (test) | NLL89.83 | 16 | |
| Generative Modeling | MNIST | -- | 10 |