Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics

About

Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore the model internals, such as the hidden states of the response tokens, to capture how much knowledge is activated. We argue that such activated knowledge may not align with what the query requires, e.g., capturing the stylistic and length-related features that are uninformative for answering the query. To fill the gap, we propose GRADE (Gradient Dynamics for knowledge gap detection), which quantifies the knowledge gap via the cross-layer rank ratio of the gradient to that of the corresponding hidden state subspace. This is motivated by the property of gradients as estimators of the required knowledge updates for a given target. We validate GRADE on six benchmarks, demonstrating its effectiveness and robustness to input perturbations. In addition, we present a case study showing how the gradient chain can generate interpretable explanations of knowledge gaps for long-form answers.

Yujing Wang, Yuanbang Liang, Yukun Lai, Hainan Zhang, Hanqi Yan• 2026

Related benchmarks

TaskDatasetResultRank
Knowledge EvaluationNatural Questions (NQ) (Evaluation)
Accuracy83
45
Knowledge gap detectionTQA
Accuracy83.2
18
Knowledge gap detectionHQA
Accuracy81.5
18
Knowledge gap detectionGSM8K
Accuracy83.6
18
Knowledge gap detectionMATH
Accuracy (Knowledge Gap)89.1
18
Knowledge gap detectionMMLU
Accuracy76.8
18
Showing 6 of 6 rows

Other info

Follow for update