Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning
About
This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably maintaining the classification accuracy of the original classifier. Mix-n-Match strategies are generic in the sense that they can be used to improve the performance of any off-the-shelf calibrator. We also reveal potential issues in standard evaluation practices. Popular approaches (e.g., histogram-based expected calibration error (ECE)) may provide misleading results especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks in most of the experimental settings. Our codes are available at https://github.com/zhang64-llnl/Mix-n-Match-Calibration.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-Tailed Image Classification | ImageNet-LT (test) | -- | 220 | |
| Node Classification | Computers | -- | 143 | |
| Confidence calibration | CIFAR-100-LT (test) | ECE0.021 | 53 | |
| Model Calibration | CIFAR-10 | ECE1.64 | 40 | |
| Model Calibration | SVHN | ECE3.33 | 40 | |
| Confidence calibration | Citeseer | ECE4.15 | 36 | |
| Confidence calibration | Cora | ECE3.45 | 36 | |
| Confidence calibration | Pubmed | ECE1.63 | 36 | |
| Calibration | MNIST | ECE0.21 | 33 | |
| Class-incremental learning | CIFAR-100 (test) | Accuracy56.25 | 30 |