Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Real-World Doctor Agent with Proactive Consultation through Multi-Agent Reinforcement Learning

About

Large language models (LLMs) struggle in real-world clinical consultations. Single-turn consultation systems require patients to describe all symptoms at once, which often leads to unclear complaints and vague diagnoses. Traditional dialogue models, constrained by static supervised learning, are limited to superficially imitating existing dialogue patterns and lack the ability to actively construct understanding in dynamic interactions, thus failing to achieve genuine clinical reasoning.To address these challenges, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework, and train a doctor agent on Qwen2.5-7B-Instruct using this framework. Within this framework, a medical consultation is modeled as a dynamic decision-making process under uncertainty. The core intelligence of the doctor agent is shifted from knowing the answer to learning and mastering a questioning methodology aimed at achieving an optimal diagnosis. Through strategic questioning, it guides the progressive emergence of key patient information in multi-turn dialogues. To support this high-fidelity simulation of the real diagnostic process, we constructed MTMedDialog, a novel English multi-turn medical consultation dataset designed for dynamic, interactive training.To validate its real-world effectiveness, rigorous evaluations including blinded human assessments and trials with real patients were conducted. DoctorAgent-RL outperformed frontier models and achieved a 70% exact diagnostic match rate, confirming its potential as a collaborative tool. By handling initial screenings, it can free clinicians to focus on complex cases, thereby addressing critical issues like physician shortages and misdiagnosis risks while alleviating the strain on healthcare resources.

Yichun Feng, Jiawei Wang, Lu Zhou, Yikai Zheng, Zhen Lei, Yixue Li• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringMedQA
Accuracy58
96
Question AnsweringMMLU
Accuracy72.5
46
Medical ReasoningHealthBench Hard
Accuracy10.5
41
Health-related dialogue and decision-makingHealthBench Main
Average Score15.77
24
Medical DialogueMAQuE
Accuracy50
14
Medical History Taking and Differential DiagnosisMIMIC-IV processed
F1 Score28.4
12
Showing 6 of 6 rows

Other info

Follow for update