Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Improving Uncertainty Estimation through Semantically Diverse Language Generation

About

Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that predictive uncertainty is one of the main causes of hallucinations. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, Sepp Hochreiter• 2024

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTriviaQA
AUROC0.85
438
Hallucination DetectionHotpotQA
AUROC0.7111
163
Hallucination DetectionTruthfulQA
AUC (ROC)0.772
102
Hallucination DetectionNQ
AUC0.6467
102
Hallucination DetectionCoQA
Mean AUROC0.769
100
Hallucination DetectionPopQA
AUC87.05
88
Hallucination self-detectionSimpleQA
AUROC95.6
27
Showing 7 of 7 rows

Other info

Follow for update