Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Finding What Matters: Anchoring Context Knowledge with Evolving Indices for Iterative Retrieval

About

Retrieval-Augmented Generation (RAG) has become a dominant paradigm for mitigating hallucinations in Large Language Models (LLMs) by incorporating external knowledge. However, existing RAG systems often struggle to effectively integrate and reason over key evidence scattered across noisy retrieved documents, particularly in multi-hop scenarios. In this paper, we propose KAIR, a Knowledge Anchoring framework for Iterative Retrieval that anchors knowledge within retrieved knowledge to guide LLMs to locate the key information. During iterative retrieval, KAIR progressively updates the knowledge index to anchor salient evidence from retrieved documents. The evolving index serves as a navigational anchoring index that enables the LLM to assess knowledge sufficiency and formulate subsequent retrieval queries. Finally, KAIR generates answers by jointly leveraging the retrieved documents and the finalized anchoring index. Experiments on four multi-hop question answering benchmarks demonstrate that KAIR consistently outperforms strong RAG baselines. Further analysis shows that KAIR effectively anchors key knowledge and alleviates the context noise during iterative retrieval, improving the LLM's ability to associate and reason over dispersed evidence across retrieved documents. All code and data are available at https://github.com/NEUIR/KAIR.

Mingyan Wu, Zhenghao Liu, Xinze Li, Yuqing Lan, Yukun Yan, Shuo Wang, Cheng Yang, Minghe Yu, Zheni Zeng, Maosong Sun• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA (test)
F129.44
311
Multi-hop Question Answering2WikiMultiHopQA (test)
EM22.4
226
Multi-hop Question Answering2WikiMQA
F1 Score71.59
161
Multi-hop Question AnsweringHotpotQA
F169.64
79
Multi-hop Question AnsweringBamboogle
EM32.23
51
Multi-hop Question AnsweringMuSiQue
F1 Score37.68
15
Multi-hop Question AnsweringAverage (MuSiQue, HotpotQA, 2WikiMQA, Bamboogle)
F1 Score55.34
15
Showing 7 of 7 rows

Other info

Follow for update