SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning

About

Retrieval-Augmented Generation (RAG) is widely employed to mitigate risks such as hallucinations and knowledge obsolescence in medical question answering, yet its predominantly single-round, static retrieval paradigm misaligns with the multi-stage process of clinical reasoning. This compressed workflow induces two structural deficiencies: question-to-query translation often lacks clinically grounded semantic interpretation, and retrieval lacks iterative sufficiency feedback, making it difficult to form reliable evidence chains. We argue that both issues stem from a deeper cause: overloading a single reasoning chain with heterogeneous tasks of interpretation, exploration, and adjudication. The remedy is to reconstruct the workflow via task decoupling and dynamic multi-round exploration. To this end, we propose SEMA-RAG, a Self-Evolving Multi-Agent RAG framework for medical question answering, which assigns these roles to three specialist agents: the Interpreter Agent for clinical schema interpretation, the Explorer Agent for sufficiency-driven self-evolving retrieval, and the Arbiter Agent for evidence adjudication and answer selection. Across five benchmarks and five LLM backbones, SEMA-RAG improves the strongest baseline by +6.46 accuracy points on average, measured per backbone.

Yongfeng Huang, Ruiying Chen, James Cheng• 2026

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy76.07	591
Medical Question Answering	PubMedQA	Accuracy59.2	122
Medical Question Answering	MMLU Med	Accuracy92.1	111
Medical Question Answering	BioASQ	Accuracy88.67	63
Medical Question Answering	MedQA US	Accuracy90.42	43
Health-related dialogue and decision-making	HealthBench Main	Average Score33.64	24
Multi-turn clinical response generation	MAQuE (test)	Accuracy61.5	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord