Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Clustered Calibration: Representation-Aware Probability Calibration via Learned Subpopulations

About

Ensuring that predicted probabilities align with observed frequencies is critical in high-stakes domains such as clinical decision support, autonomous driving and financial risk assessment. Existing calibration methods typically apply a single global transformation or rely on post-hoc binning over predicted confidences, limiting their ability to exploit heterogeneous reliability across sub-populations. We propose Clustered Calibration, a representation-aware framework that identifies sub-populations via clustering in learned feature spaces (e.g., coverage vectors, SHAP values, CNN activations, Transformer embeddings) and fits a soft mixture of cluster-specific parametric calibrators under hierarchical shrinkage toward a global mapping. This design yields context-specific calibration while maintaining global stability. Across six tabular datasets and additional image and text benchmarks, clustered calibration consistently improves or matches strong global calibrators in terms of negative log-likelihood and Brier score, while preserving AUC and accuracy. We further show, both analytically and empirically, that fixed-bin Expected Calibration Error (ECE) can mis-rank soft, region-aware calibrators even when proper scoring rules improve, and we advocate for log-loss and Brier as more reliable bases for model selection in such settings.

Tomer Lavi, Bracha Shapira, Nadav Rappoport• 2025

Related benchmarks

TaskDatasetResultRank
Image Classification CalibrationCIFAR100
Classwise ECE0.0386
99
CalibrationTabular datasets
NLL0.2983
21
Image Classification CalibrationImageNet
Accuracy79.19
15
Text ClassificationIMDB binary sentiment (five random splits)
NLL0.324
11
Image Classification CalibrationBloodMNIST
NLL0.3121
9
Text ClassificationEmotion multi-class (five random splits)
NLL0.157
9
Classification CalibrationAdult
Delta NLL (%)12
1
Classification CalibrationCredit
Delta NLL (%)1.55
1
Classification CalibrationDiabetes130
Delta NLL (%)17
1
Classification CalibrationLOS
Delta NLL0.16
1
Showing 10 of 13 rows

Other info

Follow for update