Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Teaching Models to Express Their Uncertainty in Words

About

We show that a GPT-3 model can learn to express uncertainty about its own answers in natural language -- without use of model logits. When given a question, the model generates both an answer and a level of confidence (e.g. "90% confidence" or "high confidence"). These levels map to probabilities that are well calibrated. The model also remains moderately calibrated under distribution shift, and is sensitive to uncertainty in its own answers, rather than imitating human examples. To our knowledge, this is the first time a model has been shown to express calibrated uncertainty about its own answers in natural language. For testing calibration, we introduce the CalibratedMath suite of tasks. We compare the calibration of uncertainty expressed in words ("verbalized probability") to uncertainty extracted from model logits. Both kinds of uncertainty are capable of generalizing calibration under distribution shift. We also provide evidence that GPT-3's ability to generalize calibration depends on pre-trained latent representations that correlate with epistemic uncertainty over its answers.

Stephanie Lin, Jacob Hilton, Owain Evans• 2022

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTriviaQA
AUROC0.552
265
Hallucination DetectionMATH
Mean AUROC54.36
72
Hallucination DetectionGSM8K
AUROC50
53
Hallucination DetectionCommonsenseQA
Mean AUROC0.5476
48
Hallucination DetectionBelebele
Mean AUROC0.556
48
Hallucination DetectionCoQA
Mean AUROC0.54
48
Hallucination DetectionAverage Cross-domain
Mean AUROC0.5469
48
Hallucination DetectionSVAMP
Mean AUROC54.2
48
Hallucination DetectionTruthfulQA
AUC (ROC)0.5
47
CalibrationMMLU
Brier Score0.0559
42
Showing 10 of 32 rows

Other info

Follow for update