APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation

About

Retrieval-augmented generation (RAG) connects large language models (LLMs) to external knowledge, but single-round retrieval is often insufficient for complex multi-hop questions. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches improve problem-solving performance, they still face challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL), which can lead to inaccurate retrieval results and lower performance. We attribute these failures to hierarchical credit entanglement: a single final reward updates planning and execution together, so the model cannot clearly separate plan errors from retrieval errors. We propose APEX-Searcher, which uses a Refining Credit Assignment paradigm: planning is optimized by RL with a plan-level reward, while execution is learned by SFT. Extensive experiments show consistent gains in both multi-hop RAG and task planning across benchmarks.

Kun Chen, Qingchao Kong, Zhao Feifei, Wenji Mao• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA (val)	Exact Match (EM)54	44
Multi-hop Question Answering	HotpotQA (val)	Exact Match40.2	31
Multi-hop Question Answering	Bamboogle standard (val)	Exact Match (EM)40	20
Multi-hop Question Answering	MuSiQue standard (val)	Exact Match (EM)16.4	19

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord