Bayesian Deep Learning and a Probabilistic Perspective of Generalization

About

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

Andrew Gordon Wilson, Pavel Izmailov• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy88.49	3381
Image Classification	CIFAR10C level 1-5	--	18
Regression	UCI Naval	Sharpness Score3.018	12
Regression	UCI Concrete	Sharpness1.306	12
Regression	UCI Wine	W1 Score0.177	6
Regression	Energy (UCI)	Ensemble CRPS0.344	6
Regression	Kin8nm UCI	Ensemble CRPS0.039	6
Regression	Wine (UCI)	Ensemble CRPS0.41	6
Weather forecasting	ClimaX 5.625◦	ACC96.48	6
Regression	UCI Energy	Sharp/SA1.211	6

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord