Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

About

Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose Mujica-MyGo, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce Mujica (Multi-hop Joint Intelligence for Complex Question Answering), a multi-agent RAG workflow that decomposes multi-turn interactions into cooperative sub-interactions, thereby mitigating long-context issues. To eliminate the dependency on in-context learning, we further develop MyGO (Minimalist Policy Gradient Optimization), a lightweight and efficient reinforcement learning algorithm that enables effective post-training of LLMs within complex RAG pipelines. We provide theoretical guarantees for MyGO's convergence to the optimal policy. Empirical evaluations across diverse question-answering benchmarks, covering both text corpora and knowledge graphs, show that Mujica-MyGO achieves superior performance.

Yihong Wu, Liheng Ma, Muzhi Li, Jiaming Zhou, Lei Ding, Jianye Hao, Ho-fung Leung, Irwin King, Yingxue Zhang, Jian-Yun Nie• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score53.79	294
Multi-hop Question Answering	2Wiki	Exact Match53.17	215
Multi-hop Question Answering	MuSiQue	Exact Match (EM)26.11	31
Multi-hop Question Answering	2Wiki-KG	EM85.93	7
Multi-hop Question Answering	2Wiki-Text	EM58.88	7
Multi-hop Question Answering	Hotpot Kimi	EM54.07	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord