Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Calibration without Ground Truth

About

Villalobos et al. [2024] predict that publicly available human text will be exhausted within the next decade. Thus, improving models without access to ground-truth labels becomes increasingly important. We propose a label-free post-processing framework that improves a strong but miscalibrated model using a weaker yet better-calibrated reference. Our framework guarantees a strict performance improvement under any proper loss. Our approach is based on a characterization of when strict improvement is possible: when the strong and reference models are not mutually calibrated. We formalize this condition, connect it to arbitrage and no-trade results from economics, and develop an efficient Bregman projection algorithm that guarantees worst-case loss reduction without labels. Experiments on representative LLMs across varying scales demonstrate that our label-free method significantly reduces proper losses and calibration errors, achieving performance competitive with supervised baselines.

Yuqing Kong, Mingyu Song, Yizhou Wang, Yifan Wu• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningCommonsenseQA
BS0.1262
54
Language UnderstandingMMLU-Redux
Base Score0.3571
24
Knowledge EvaluationMMLU-Redux
Brier Score0.1232
18
Multiple-choice Question AnsweringMMLU Redux (test)
BS0.1232
12
Showing 4 of 4 rows

Other info

Follow for update