APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation
About
Retrieval-augmented generation (RAG) connects large language models (LLMs) to external knowledge, but single-round retrieval is often insufficient for complex multi-hop questions. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches improve problem-solving performance, they still face challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL), which can lead to inaccurate retrieval results and lower performance. We attribute these failures to hierarchical credit entanglement: a single final reward updates planning and execution together, so the model cannot clearly separate plan errors from retrieval errors. We propose APEX-Searcher, which uses a Refining Credit Assignment paradigm: planning is optimized by RL with a plan-level reward, while execution is learned by SFT. Extensive experiments show consistent gains in both multi-hop RAG and task planning across benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultihopQA (val) | Exact Match (EM)54 | 44 | |
| Multi-hop Question Answering | HotpotQA (val) | Exact Match40.2 | 31 | |
| Multi-hop Question Answering | Bamboogle standard (val) | Exact Match (EM)40 | 20 | |
| Multi-hop Question Answering | MuSiQue standard (val) | Exact Match (EM)16.4 | 19 |