Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning

About

Retrieval-Augmented Generation (RAG) is widely employed to mitigate risks such as hallucinations and knowledge obsolescence in medical question answering, yet its predominantly single-round, static retrieval paradigm misaligns with the multi-stage process of clinical reasoning. This compressed workflow induces two structural deficiencies: question-to-query translation often lacks clinically grounded semantic interpretation, and retrieval lacks iterative sufficiency feedback, making it difficult to form reliable evidence chains. We argue that both issues stem from a deeper cause: overloading a single reasoning chain with heterogeneous tasks of interpretation, exploration, and adjudication. The remedy is to reconstruct the workflow via task decoupling and dynamic multi-round exploration. To this end, we propose SEMA-RAG, a Self-Evolving Multi-Agent RAG framework for medical question answering, which assigns these roles to three specialist agents: the Interpreter Agent for clinical schema interpretation, the Explorer Agent for sufficiency-driven self-evolving retrieval, and the Arbiter Agent for evidence adjudication and answer selection. Across five benchmarks and five LLM backbones, SEMA-RAG improves the strongest baseline by +6.46 accuracy points on average, measured per backbone.

Yongfeng Huang, Ruiying Chen, James Cheng• 2026

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy76.07
521
Medical Question AnsweringPubMedQA
Accuracy59.2
117
Medical Question AnsweringMMLU Med
Accuracy92.1
86
Medical Question AnsweringBioASQ
Accuracy88.67
63
Medical Question AnsweringMedQA US
Accuracy90.42
43
Health-related dialogue and decision-makingHealthBench Main
Average Score33.64
24
Multi-turn clinical response generationMAQuE (test)
Accuracy61.5
2
Showing 7 of 7 rows

Other info

Follow for update