On Calibration of Modern Neural Networks

About

Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger• 2017

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100	--	691
Image Classification	Food-101	Accuracy86.6	570
Image Classification	ImageNet LT	Top-1 Accuracy37.9	264
Out-of-Distribution Detection	iNaturalist	AUROC90.5	252
Long-Tailed Image Classification	ImageNet-LT (test)	--	246
Commonsense Reasoning	ARC Challenge	Accuracy64.9	243
Out-of-Distribution Detection	Textures	AUROC0.8539	186
Node Classification	Computers	--	169
Image Classification	ImageNet-LT (test)	--	159
Commonsense Reasoning	ARC-E	Accuracy85.2	152

Showing 10 of 385 rows

...

Other info

Follow for update

@wizwand_team Discord