Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Verified Uncertainty Calibration

About

Applications such as weather forecasting and personalized medicine demand models that output calibrated probability estimates---those representative of the true likelihood of a prediction. Most models are not calibrated out of the box but are recalibrated by post-processing model outputs. We find in this work that popular recalibration methods like Platt scaling and temperature scaling are (i) less calibrated than reported, and (ii) current techniques cannot estimate how miscalibrated they are. An alternative method, histogram binning, has measurable calibration error but is sample inefficient---it requires $O(B/\epsilon^2)$ samples, compared to $O(1/\epsilon^2)$ for scaling methods, where $B$ is the number of distinct probabilities the model can output. To get the best of both worlds, we introduce the scaling-binning calibrator, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration. This requires only $O(1/\epsilon^2 + B)$ samples. Next, we show that we can estimate a model's calibration error more accurately using an estimator from the meteorological community---or equivalently measure its calibration error with fewer samples ($O(\sqrt{B})$ instead of $O(B)$). We validate our approach with multiclass calibration experiments on CIFAR-10 and ImageNet, where we obtain a 35% lower calibration error than histogram binning and, unlike scaling methods, guarantees on true calibration. In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators. We implement all these methods in a Python library: https://pypi.org/project/uncertainty-calibration

Ananya Kumar, Percy Liang, Tengyu Ma• 2019

Related benchmarks

TaskDatasetResultRank
Long-Tailed Image ClassificationImageNet-LT (test)--
220
Confidence calibrationCIFAR-100-LT (test)
ECE0.0596
53
Model CalibrationCIFAR-10
ECE2.49
40
Model CalibrationSVHN
ECE7
40
CalibrationMNIST
ECE0.25
33
CalibrationDigital-S
ECE7.67
27
CalibrationUSPS
ECE5.12
27
Uncertainty CalibrationCIFAR-10.1
ECE0.0563
27
Uncertainty CalibrationCIFAR-F
ECE8.54
27
Uncertainty CalibrationCIFAR-10.1 C
ECE33.16
27
Showing 10 of 10 rows

Other info

Follow for update