Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

About

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud• 2017

Related benchmarks

Task	Dataset	Result
Log-likelihood estimation	MNIST dynamically binarized (test)	Log-Likelihood-100.8	48
k-subset selection	BeerAdvocate Aroma aspect L2X experimental setup (test)	MSE0.0246	24
k-subset selection	Appearance aspect data (val)	MSE2.51	24
k-subset selection	Palate aspect data (val)	MSE2.86	24
k-subset selection	BeerAdvocate Taste (val test)	MSE2.64	24
Binary Latent VAE Training	Omniglot (train)	Average ELBO462.2	14
Binary Latent VAE Training	MNIST (train)	Avg ELBO688.6	14
Binary Latent VAE Training	Fashion-MNIST (train)	Average ELBO196.4	14
Log-likelihood estimation	Fashion-MNIST dynamically binarized (test)	Log-Likelihood Bound (100-point)-239	7
Log-likelihood estimation	MNIST Non-binarized original (test)	Test Log-Likelihood Bound (100-point)686.2	7

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord