Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Stochastic Gradient Hamiltonian Monte Carlo

About

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.

Tianqi Chen, Emily B. Fox, Carlos Guestrin• 2014

Related benchmarks

TaskDatasetResultRank
Character-level Language ModelingShakespeare modern
Accuracy55.63
48
Binary ClassificationCIFAR-10 airplane-automobile (test)
Posterior Expected Log Loss0.5694
20
Binary ClassificationCIFAR-10 cat-dog (test)
Posterior Expected Log Loss0.8329
20
Binary ClassificationCIFAR-10 deer-horse (test)
Log Loss0.7567
20
Binary ClassificationMNIST digits 7 and 9 (test)
Expected Log Loss0.2619
19
Multiclass ClassificationLETTER (test)
Log Loss (Posterior)0.3299
18
Multiclass Classificationacoustic (test)
Log Loss (Posterior Expected)0.6338
18
RegressionAirfoil (3 train-test splits)
LPPD-0.176
7
RegressionBikesharing (3 train-test splits)
LPPD-0.092
7
RegressionEnergy (3 train-test splits)
LPPD1.063
7
Showing 10 of 17 rows

Other info

Follow for update