Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Query-Level Uncertainty in Large Language Models

About

It is important for Large Language Models (LLMs) to be aware of the boundary of their knowledge, distinguishing queries they can confidently answer from those that lie beyond their capabilities. Such awareness enables models to perform adaptive inference, such as invoking retrieval-augmented generation (RAG), engaging in slow and deep thinking, or abstaining from answering when appropriate. These mechanisms are key to developing efficient and trustworthy AI. In this work, we propose a method to detect knowledge boundaries via Query-Level Uncertainty, which estimates if a model is capable of answering a given query before generating any tokens, thus avoiding the generation cost. To this end, we propose a novel, training-free method called Internal Confidence, which leverages self-evaluations across layers and tokens to provide a reliable signal of uncertainty. Empirical studies on both factual question answering and mathematical reasoning tasks demonstrate that our Internal Confidence outperforms several baselines in quality of confidence while being computationally cheaper. Furthermore, we demonstrate its benefits in adaptive inference settings, showing that for RAG and model cascading it reduces inference costs while preserving overall performance.

Lihu Chen, Gerard de Melo, Fabian M. Suchanek, Ga\"el Varoquaux• 2025

Related benchmarks

TaskDatasetResultRank
Knowledge EvaluationNatural Questions (NQ) (Evaluation)
Accuracy64
45
Uncertainty Estimation (Factual QA)TriviaQA 1,000 samples (val)
AUROC71.9
27
Uncertainty Estimation (Factual QA)SciQ 1,000 samples (val)
AUROC62.6
27
Uncertainty Estimation (Mathematical Reasoning)GSM8K 1,000 samples (val)
AUROC0.668
27
Uncertainty EstimationTruthfulQA
AUROC63.2
24
Uncertainty EstimationSimpleQA, MuSiQue, and TruthfulQA Average
AUROC61
24
Uncertainty EstimationSimpleQA
AUROC61.2
24
Uncertainty EstimationMuSiQue
AUROC65.5
24
Knowledge gap detectionHQA
Accuracy74.7
18
Knowledge gap detectionMATH
Accuracy (Knowledge Gap)71.5
18
Showing 10 of 13 rows

Other info

Follow for update