Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning

About

This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably maintaining the classification accuracy of the original classifier. Mix-n-Match strategies are generic in the sense that they can be used to improve the performance of any off-the-shelf calibrator. We also reveal potential issues in standard evaluation practices. Popular approaches (e.g., histogram-based expected calibration error (ECE)) may provide misleading results especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks in most of the experimental settings. Our codes are available at https://github.com/zhang64-llnl/Mix-n-Match-Calibration.

Jize Zhang, Bhavya Kailkhura, T. Yong-Jin Han• 2020

Related benchmarks

Task	Dataset	Result
Long-Tailed Image Classification	ImageNet-LT (test)	--	246
Node Classification	Computers	--	169
Model Calibration	CIFAR-10	ECE1.64	68
Calibration	USPS	ECE4.6	57
Confidence calibration	CIFAR-100-LT (test)	ECE0.021	53
Model Calibration	SVHN	ECE3.33	40
Confidence calibration	Citeseer	ECE4.15	36
Confidence calibration	Cora	ECE3.45	36
Confidence calibration	Pubmed	ECE1.63	36
Calibration	MNIST	ECE0.21	33

Showing 10 of 51 rows

Other info

Follow for update

@wizwand_team Discord