Uncertainty Estimation in Autoregressive Structured Prediction
About
Uncertainty estimation is important for ensuring safety and robustness of AI systems. While most research in the area has focused on un-structured prediction tasks, limited work has investigated general uncertainty estimation approaches for structured prediction. Thus, this work aims to investigate uncertainty estimation for autoregressive structured prediction tasks within a single unified and interpretable probabilistic ensemble-based framework. We consider: uncertainty estimation for sequence data at the token-level and complete sequence-level; interpretations for, and applications of, various measures of uncertainty; and discuss both the theoretical and practical challenges associated with obtaining them. This work also provides baselines for token-level and sequence-level error detection, and sequence-level out-of-domain input detection on the WMT'14 English-French and WMT'17 English-German translation and LibriSpeech speech recognition datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Detection | TriviaQA | -- | 265 | |
| Hallucination Detection | HaluEval (test) | AUC-ROC65.18 | 126 | |
| Hallucination Detection | NQ | AUC0.73 | 102 | |
| Model Calibration | MACE | AUROC81.8 | 84 | |
| Hallucination Detection | HELM Passage Level v1.0 (test) | AUC0.8349 | 84 | |
| Hallucination Detection | HELM Sentence Level v1.0 (test) | AUC0.7019 | 84 | |
| Confidence calibration | MACE (test) | AUROC73.8 | 84 | |
| Question Answering | 5 QA tasks | Accuracy54.02 | 78 | |
| Uncertainty Estimation | TriviaQA (test) | AUROC78.3 | 78 | |
| LLM Calibration | MACE | ECE30.3 | 60 |