Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

About

In modern RAG pipelines, query augmentation methods such as HyDE and query expansion are applied to every query, resulting in substantial LLM inference costs and increased end-to-end latency. The empirical justification for this overhead in real production traffic remains largely unexplored. We present a case study of the Danish National Encyclopedia, evaluating five retrieval workflows over 20,000 query-workflow pairs from production traffic and synthetic conditions. In this system, synthetic queries suggest that LLM augmentation is needed for over 90% of queries to achieve high retrieval coverage. However, under our production deferral policy, only 27.8% of real user queries need LLM augmentation. We call this gap the Coverage Illusion and attribute it to a structural mismatch between synthetic and real query distributions. Pre-retrieval routing cannot resolve this gap, as the need for LLM augmentation is only revealed after searching the index, a result confirmed by our evaluation of four machine learning paradigms. The coverage gap, undetectable from the query alone, motivates a post-retrieval cascade that runs workflows in cheapest-first order and escalates to LLM augmentation only when a step returns no documents. Operating entirely without training overhead or secondary serving infrastructure, the cascade improves quality by +0.140 Composite Overall points over Always-HyDE, reduces latency by 31.8%, and serves 72.2% of real user queries without LLM augmentation.

Zafar Hussain, Kristoffer Nielbo• 2026

Related benchmarks

TaskDatasetResultRank
Query Routing Classificationoracle high-contrast queries n=442 (test)--
8
RAG Query RoutingEncyclopedia Production Deployment Real User Queries
Augmentation-free Rate72.2
6
Retrieval-Augmented Generation1,000 Real User Queries 1.0 (full set)--
6
Information RetrievalReal User Queries 1,000 (full)
CO4.084
3
Information RetrievalDanish National Encyclopedia Real User queries
CO Score4.084
2
Information RetrievalDanish National Encyclopedia Synth-Conv queries
CO Score4.62
2
LLM-augmented RetrievalDanish National Encyclopedia Real User Queries High-contrast
CO Score3.936
2
LLM-augmented RetrievalDanish National Encyclopedia Synth-Conv Queries High-contrast
CO Score4.249
2
Information RetrievalDanish National Encyclopedia Synth-Polluted queries
CO Score4.559
2
Information RetrievalDanish National Encyclopedia Synth-Keywords queries
CO Score4.471
2
Showing 10 of 12 rows

Other info

Follow for update