Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

About

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud• 2017

Related benchmarks

TaskDatasetResultRank
Log-likelihood estimationMNIST dynamically binarized (test)
Log-Likelihood-100.8
48
k-subset selectionBeerAdvocate Aroma aspect L2X experimental setup (test)
MSE0.0246
24
k-subset selectionAppearance aspect data (val)
MSE2.51
24
k-subset selectionPalate aspect data (val)
MSE2.86
24
k-subset selectionBeerAdvocate Taste (val test)
MSE2.64
24
Binary Latent VAE TrainingOmniglot (train)
Average ELBO462.2
14
Binary Latent VAE TrainingMNIST (train)
Avg ELBO688.6
14
Binary Latent VAE TrainingFashion-MNIST (train)
Average ELBO196.4
14
Log-likelihood estimationFashion-MNIST dynamically binarized (test)
Log-Likelihood Bound (100-point)-239
7
Log-likelihood estimationMNIST Non-binarized original (test)
Test Log-Likelihood Bound (100-point)686.2
7
Showing 10 of 16 rows

Other info

Follow for update