Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Structured Matrix Scaling for Multi-Class Calibration

About

Post-hoc recalibration methods are widely used to ensure that classifiers provide faithful probability estimates. We argue that parametric recalibration functions based on logistic regression can be motivated from a simple theoretical setting for both binary and multiclass classification. This insight motivates the use of more expressive calibration methods beyond standard temperature scaling. For multi-class calibration however, a key challenge lies in the increasing number of parameters introduced by more complex models, often coupled with limited calibration data, which can lead to overfitting. Through extensive experiments, we demonstrate that the resulting bias-variance tradeoff can be effectively managed by structured regularization, robust preprocessing and efficient optimization. The resulting methods lead to substantial gains over existing logistic-based calibration techniques. We provide efficient and easy-to-use open-source implementations of our methods, making them an attractive alternative to common temperature, vector, and matrix scaling implementations.

Eug\`ene Berta, David Holzm\"uller, Michael I. Jordan, Francis Bach• 2025

Related benchmarks

TaskDatasetResultRank
Model CalibrationCIFAR10 (test)--
61
Multi-class CalibrationCIFAR-100 logits (test)
LogLoss Absolute Improvement-0.97
60
Multi-class CalibrationImageNet (test)
NLL Improvement (Absolute)-0.049
12
Multi-class Post-hoc CalibrationTabRepo 1365 multi-class experiments (test)
Brier Score Difference-0.0046
9
Binary calibration2184 binary experiments (test)
Brier Score Gap-0.0034
6
Showing 5 of 5 rows

Other info

Follow for update