Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation

About

Retrieval-augmented generation (RAG) connects large language models (LLMs) to external knowledge, but single-round retrieval is often insufficient for complex multi-hop questions. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches improve problem-solving performance, they still face challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL), which can lead to inaccurate retrieval results and lower performance. We attribute these failures to hierarchical credit entanglement: a single final reward updates planning and execution together, so the model cannot clearly separate plan errors from retrieval errors. We propose APEX-Searcher, which uses a Refining Credit Assignment paradigm: planning is optimized by RL with a plan-level reward, while execution is learned by SFT. Extensive experiments show consistent gains in both multi-hop RAG and task planning across benchmarks.

Kun Chen, Qingchao Kong, Zhao Feifei, Wenji Mao• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA (val)
Exact Match (EM)54
44
Multi-hop Question AnsweringHotpotQA (val)
Exact Match40.2
31
Multi-hop Question AnsweringBamboogle standard (val)
Exact Match (EM)40
20
Multi-hop Question AnsweringMuSiQue standard (val)
Exact Match (EM)16.4
19
Showing 4 of 4 rows

Other info

Follow for update