DVAE++: Discrete Variational Autoencoders with Overlapping Transformations
About
Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult. We propose a new class of smoothing transformations based on a mixture of two overlapping distributions, and show that the proposed transformation can be used for training binary latent models with either directed or undirected priors. We derive a new variational bound to efficiently train with Boltzmann machine priors. Using this bound, we develop DVAE++, a generative model with a global discrete prior and a hierarchy of convolutional continuous variables. Experiments on several benchmarks show that overlapping transformations outperform other recent continuous relaxations of discrete latent variables including Gumbel-Softmax (Maddison et al., 2016; Jang et al., 2016), and discrete variational autoencoders (Rolfe 2016).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generative Modeling | CIFAR-10 (test) | NLL (bits/dim)3.38 | 62 | |
| Generative Modeling | CIFAR-10 | BPD3.38 | 46 | |
| Density Estimation | OMNIGLOT dynamically binarized (test) | NLL92.38 | 16 | |
| Generative Modeling | Dynamically binarized MNIST (test) | -- | 13 | |
| Generative Modeling | MNIST | -- | 10 | |
| Likelihood Estimation | MNIST | NLL78.49 | 7 |