Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning

About

Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new "search-and-refine-during-think" paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.

Yaorui Shi, Sihang Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM32.8
278
Multi-hop Question AnsweringMuSiQue
EM16.9
106
Multi-hop Question AnsweringBamboogle
Exact Match32
97
Single-hop Question AnsweringTriviaQA
EM58.7
62
Multi-hop Question AnsweringHotpotQA
Exact Match (EM)38.2
56
Single-hop Question AnsweringPopQA
EM44.9
55
Multi-hop Question AnsweringBrowseComp-ZH
LJFT5.19
5
Multi-hop Question AnsweringWeb Dancer
LJFT39.09
5
Multi-hop Question AnsweringMuSiQue
LJFT19.8
5
Multi-hop Question AnsweringAverage (BrowseComp-ZH, Bamboogle, MuSiQue, Web Dancer) (Overall)
LJFT Score28.02
5
Showing 10 of 11 rows

Other info

Follow for update