ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables

About

Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Multi-Sample gradient estimator. ARMS uses a copula to generate any number of mutually antithetic samples. It is unbiased, has low variance, and generalizes both DisARM, which we show to be ARMS with two samples, and the leave-one-out REINFORCE (LOORF) estimator, which is ARMS with uncorrelated samples. We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. We also develop a version of ARMS for optimizing the multi-sample variational bound, and show that it outperforms both VIMCO and DisARM. The code is publicly available.

Alek Dimitriev, Mingyuan Zhou• 2021

Related benchmarks

Task	Dataset	Result
Log-likelihood estimation	MNIST dynamically binarized (test)	Log-Likelihood-99.08	48
Binary Latent VAE Training	Omniglot (train)	Average ELBO458	14
Binary Latent VAE Training	MNIST (train)	Avg ELBO683.5	14
Binary Latent VAE Training	Fashion-MNIST (train)	Average ELBO193.1	14
Log-likelihood estimation	MNIST Non-binarized original (test)	Test Log-Likelihood Bound (100-point)688.6	7
Log-likelihood estimation	Fashion-MNIST Non-binarized original (test)	Log-Likelihood Bound174.1	7
Log-likelihood estimation	Omniglot Non-binarized original (test)	Test Log-Likelihood Bound320.4	7
Log-likelihood estimation	OMNIGLOT dynamically binarized (test)	Test Log-Likelihood Bound-116.8	7
Log-likelihood estimation	Fashion-MNIST dynamically binarized (test)	Log-Likelihood Bound (100-point)-238.2	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord