Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

About

Retrieval Augmented Generation (RAG)'s success depends on the utility the LLM derives from the content used for grounding. Quantifying content utility does not have a definitive specification and existing metrics ignore model-specific capabilities and/or rely on costly annotations. In this paper, we propose Grounding Generation Utility (GroGU), a model-specific and reference-free metric that defines utility as a function of the downstream LLM's generation confidence based on entropy. Despite having no annotation requirements, GroGU is largely faithful in distinguishing ground-truth documents while capturing nuances ignored by LLM-agnostic metrics. We apply GroGU to train a query-rewriter for RAG by identifying high-utility preference data for Direct Preference Optimization. Experiments show improvements by up to 18.2 points in Mean Reciprocal Rank and up to 9.4 points in answer accuracy.

Yilun Hua, Giuseppe Castellucci, Peter Schulam, Heba Elfardy, Kevin Small• 2026

Related benchmarks

TaskDatasetResultRank
Conversational Query RetrievalTopiOCQA
MRR38.1
20
Conversational Query RetrievalQReCC
MRR45.9
20
Conversational Information RetrievalTopiOCQA (test)
R@1061.7
13
Conversational Information RetrievalQReCC (test)
R@1067.2
13
Conversational Question AnsweringQReCC (test)
EM (%)120
12
Conversational Question AnsweringTopiOCQA (test)
EM20.8
12
Showing 6 of 6 rows

Other info

Follow for update