Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
About
Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Log-likelihood estimation | MNIST dynamically binarized (test) | Log-Likelihood-100.8 | 48 | |
| k-subset selection | BeerAdvocate Aroma aspect L2X experimental setup (test) | MSE0.0246 | 24 | |
| k-subset selection | Appearance aspect data (val) | MSE2.51 | 24 | |
| k-subset selection | Palate aspect data (val) | MSE2.86 | 24 | |
| k-subset selection | BeerAdvocate Taste (val test) | MSE2.64 | 24 | |
| Binary Latent VAE Training | Omniglot (train) | Average ELBO462.2 | 14 | |
| Binary Latent VAE Training | MNIST (train) | Avg ELBO688.6 | 14 | |
| Binary Latent VAE Training | Fashion-MNIST (train) | Average ELBO196.4 | 14 | |
| Log-likelihood estimation | Fashion-MNIST dynamically binarized (test) | Log-Likelihood Bound (100-point)-239 | 7 | |
| Log-likelihood estimation | MNIST Non-binarized original (test) | Test Log-Likelihood Bound (100-point)686.2 | 7 |
Showing 10 of 16 rows