Unified Approach for Weakly Supervised Multicalibration
About
Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification Calibration | MEPS (test) | Oracle ECE1.74 | 150 | |
| Income Prediction | ACSIncome (test) | Oracle ECE0.92 | 150 | |
| Classification | CreditDefault (test) | Oracle ECE1.26 | 125 | |
| Calibration | HMDA (test) | Oracle ECE1.48 | 100 | |
| Image Classification | CelebA (test) | Accuracy92.11 | 82 | |
| Classification | HMDA (test) | Oracle ECE2.97 | 50 | |
| Tabular Classification | CreditDefault (test) | Oracle ECE4.77 | 25 | |
| Calibration | CelebA ImageResNet (test) | ECE (Oracle Estimate)1.01 | 20 | |
| Toxicity Detection | CivilComments BERT (test) | Oracle ECE1.27 | 20 | |
| Calibration | CivilComments BERT (test) | ECE (Oracle Estimate)1.35 | 5 |