Oops I Took A Gradient: Scalable Sampling for Discrete Distributions
About
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Conditional estimation | Dynamic MNIST (test) | Test Log Likelihood-80.51 | 18 | |
| Graph generation | Ego-small (test) | Degree0.095 | 11 | |
| Generative Modeling | Omniglot (test) | Log Likelihood-94.72 | 8 | |
| Regression | Breast Cancer (UCI) (test) | Avg Test Log-likelihood0.0241 | 5 | |
| Regression | COMPAS UCI (test) | Average Test Log-likelihood0.2265 | 5 | |
| Regression | HIV UCI (test) | Avg Test Log-likelihood0.7025 | 5 | |
| Regression | Blog UCI (test) | Avg Test Log-likelihood0.2799 | 5 | |
| Traveling Salesman Problem | eil14 | Cost370.7 | 5 | |
| RBM learning | MNIST (test) | Log Likelihood (AIS)-387.3 | 4 | |
| RBM learning | EMNIST (test) | Log Likelihood (AIS)-591 | 4 |