Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

About

Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods rely on noisy token-level signals and lack the multi-round refinement required for complex reasoning. In the paper, we propose MA-RAG (Multi-Round Agentic RAG), a framework that facilitates test-time scaling for complex medical reasoning by iteratively evolving both external evidence and internal reasoning history within an agentic refinement loop. At each round, the agent transforms semantic conflict among candidate responses into actionable queries to retrieve external evidence, while optimizing history reasoning traces to mitigate long-context degradation. MA-RAG extends the self-consistency principle by leveraging the lack of consistency as a proactive signal for multi-round agentic reasoning and retrieval, and mirrors a boosting mechanism that iteratively minimizes the residual error toward a stable, high-fidelity medical consensus. Extensive evaluations across 7 medical Q&A benchmarks show that MA-RAG consistently surpasses competitive inference-time scaling and RAG baselines, delivering substantial +6.8 points on average accuracy over the backbone model. Our code is available at https://github.com/NJU-RL/MA-RAG.

Wenhao Wu, Zhentao Tang, Yafu Li, Shixiong Kai, Mingxuan Yuan, Chunlin Chen, Zhi Wang• 2026

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy67.2
346
Medical Question AnsweringMedQA
Accuracy77.1
153
Medical Question AnsweringMedExpQA
Overall Accuracy78.4
70
Medical Question AnsweringMedbullets
Accuracy59.1
65
Medical Question AnsweringMedXpertQA
Accuracy22.2
31
Medical Question AnsweringMMLU-P
Accuracy70.9
29
Medical Question AnsweringNEJM
Accuracy60.8
16
Showing 7 of 7 rows

Other info

Follow for update