Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

About

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

Meelis Kull, Miquel Perello-Nieto, Markus K\"angsepp, Telmo Silva Filho, Hao Song, Peter Flach• 2019

Related benchmarks

Task	Dataset	Result
Image Classification Calibration	CIFAR100	Classwise ECE0.1315	99
Model Calibration	CIFAR10 (test)	--	68
Confidence calibration	Dermatology	Confidence Calibration Error0.024	66
Classification	vehicle	Accuracy82.9	65
Multi-class Calibration	CIFAR-100 logits (test)	LogLoss Absolute Improvement0.166	60
Classification	CAR	Accuracy98.8	47
Confidence calibration	CAR	Calibration Error1.1	44
Confidence calibration	Glass	Calibration Error0.102	44
Confidence calibration	vehicle	Calibration Error0.063	44
Confidence calibration	Cora	ECE0.0364	36

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord