BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling
About
With the introduction of the variational autoencoder (VAE), probabilistic latent variable models have received renewed attention as powerful generative models. However, their performance in terms of test likelihood and quality of generated samples has been surpassed by autoregressive models without stochastic units. Furthermore, flow-based models have recently been shown to be an attractive alternative that scales well to high-dimensional data. In this paper we close the performance gap by constructing VAE models that can effectively utilize a deep hierarchy of stochastic variables and model complex covariance structures. We introduce the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path. We show that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution. We observe that BIVA, in contrast to recent results, can be used for anomaly detection. We attribute this to the hierarchy of latent variables which is able to extract high-level semantic features. Finally, we extend BIVA to semi-supervised classification tasks and show that it performs comparably to state-of-the-art results by generative adversarial networks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | CIFAR-10 (test) | -- | 471 | |
| Density Estimation | CIFAR-10 (test) | -- | 134 | |
| Density Estimation | ImageNet 32x32 (test) | Bits per Sub-pixel3.96 | 66 | |
| Generative Modeling | CIFAR-10 (test) | NLL (bits/dim)3.08 | 62 | |
| Log-likelihood estimation | MNIST dynamically binarized (test) | Log-Likelihood80.6 | 48 | |
| Generative Modeling | CIFAR-10 | BPD3.08 | 46 | |
| Density Estimation | binarized MNIST 28x28 (test) | Test LogL81.2 | 44 | |
| Generative Modeling | MNIST (test) | -- | 35 | |
| Generative Modeling | ImageNet 32x32 downsampled | Bits Per Dimension3.96 | 24 | |
| Density Estimation | OMNIGLOT dynamically binarized (test) | NLL91.34 | 16 |