Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue

About

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Single-round consultation systems require patients to describe all symptoms upfront, leading to vague diagnosis with unclear complaints. Traditional multi-turn dialogue models, constrained by static supervised learning, lack flexibility and fail to intelligently extract key clinical information. To address these limitations, we propose \Ours{}, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that \Ours{} outperforms existing models in both multi-turn reasoning capability and final diagnostic performance. This approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages. Code and data are available at https://github.com/JarvisUSTC/DoctorAgent-RL

Yichun Feng, Jiawei Wang, Lu Zhou, Zhen Lei, Yixue Li• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringMedQA
Accuracy58
96
Question AnsweringMMLU
Accuracy72.5
46
Medical ReasoningHealthBench Hard
Accuracy10.5
41
Health-related dialogue and decision-makingHealthBench Main
Average Score15.77
22
Medical DialogueMAQuE
Accuracy50
14
Medical History Taking and Differential DiagnosisMIMIC-IV processed
F1 Score28.4
12
Showing 6 of 6 rows

Other info

Follow for update