Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

About

Tool-Integrated Reasoning (TIR) with search engines enables large language models to iteratively retrieve up-to-date external knowledge, enhancing adaptability and generalization in complex question-answering tasks. However, existing search agent pipelines typically depend on reinforcement learning based optimization, which often suffers from sparse outcome rewards, leading to inefficient exploration and unstable training. We introduce CriticSearch, a fine-grained credit-assignment framework that supplies dense, turn-level feedback via a retrospective critic mechanism. During training, a frozen, asymmetric critique LLM retrospectively evaluates each turn using privileged information from the full trajectory and gold answers, converting these assessments into stable, dense rewards that guide policy improvement. Experimental results across diverse multi-hop reasoning benchmarks demonstrate that CriticSearch consistently outperforms existing baselines, achieving faster convergence, improved training stability, and higher performance.

Yaocheng Zhang, Haohuan Huang, Zijun Song, Yuanheng Zhu, Qichao Zhang, Zijie Zhao, Dongbin Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA (test)--
255
Multi-hop Question Answering2WikiMultiHopQA (test)
EM40.9
195
Question AnsweringHotpotQA
EM41.4
109
Question Answering2WikiMultihopQA
EM40.9
107
Multi-hop Question AnsweringBamboogle (test)
EM36.8
84
Multi-hop Question AnsweringMulti-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle)
HotpotQA Score44.2
48
Question AnsweringBamboogle
EM Accuracy (%)36.8
45
Multi-hop Question AnsweringMulti-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle) (test)
HotpotQA Score0.414
44
Question AnsweringMuSiQue
EM18
24
Showing 9 of 9 rows

Other info

Follow for update