Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

About

Recent work has identified retrieval heads, a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needlein-a-Haystack tasks. In this paper, we introduce QRHead (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHead by aggregating attention scores with respect to the input query, using a handful of examples from real-world tasks (e.g., long-context QA). We further introduce QRRetriever, an efficient and effective retriever that uses the accumulated attention mass of QRHead as retrieval scores. We use QRRetriever for long-context reasoning by selecting the most relevant parts with the highest retrieval scores. On multi-hop reasoning tasks LongMemEval and CLIPPER, this yields over 10% performance gains over full context and outperforms strong dense retrievers. We also evaluate QRRetriever as a re-ranker on the BEIR benchmark and find that it achieves strong zero-shot performance, outperforming other LLM-based re-rankers such as RankGPT. Further analysis shows that both the query-context attention scoring and task selection are crucial for identifying QRHead with strong downstream utility. Overall, our work contributes a general-purpose retriever and offers interpretability insights into the long-context capabilities of LMs.

Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringNarrativeQA
F131.4
36
RetrievalHotpotQA
R@594.8
24
Re-rankingBEIR (test)
NQ58.6
23
End-to-End PerformanceLongMemEval
Top-5 Recall59.8
20
End-to-End PerformanceClipper
Top-3 Recall47.6
20
RetrievalLongMemEval
Recall@585.5
18
RetrievalClipper
Recall@393.8
18
RetrievalMuSiQue
Recall@571.22
10
RetrievalNarrativeQA
Recall@324.28
8
RetrievalOverall (Musique, HotpotQA, NarrativeQA, DetectiveQA)
Avg Recall@350.33
8
Showing 10 of 13 rows

Other info

Follow for update