Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

About

The professionalism of a human doctor in outpatient service depends on two core abilities: the ability to make accurate medical decisions and the medical consultation skill to conduct strategic, empathetic patient inquiry. Existing Large Language Models (LLMs) have achieved remarkable accuracy on medical decision-making benchmarks. However, they often lack the ability to conduct the strategic and empathetic consultation, which is essential for real-world clinical scenarios. To address this gap, we propose Doctor-R1, an AI doctor agent trained to master both of the capabilities by ask high-yield questions and conduct strategic multi-turn inquiry to guide decision-making. Our framework introduces three key components: a multi-agent interactive environment, a two-tiered reward architecture that separately optimizes clinical decision-making and communicative inquiry skills, and an experience repository to ground policy learning in high-quality prior trajectories. We evaluate Doctor-R1 on OpenAI's HealthBench and MAQuE, assessed across multi-facet metrics, such as communication quality, user experience, and task accuracy. Remarkably, Doctor-R1 surpasses state-of-the-art open-source specialized LLMs by a substantial margin with higher parameter efficiency and outperforms powerful proprietary models. Furthermore, the human expert evaluations show that Doctor-R1 achieves superior clinical capability and patient-centric performance, demonstrating the effectiveness of the framework.

Yunghwei Lai, Kaiming Liu, Ziyue Wang, Weizhi Ma, Yang Liu• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	MedQA	Accuracy83.5	96
Question Answering	MMLU	Accuracy85	46
Medical Reasoning	HealthBench Hard	Accuracy24.27	41
Health-related dialogue and decision-making	HealthBench Main	Average Score36.29	24
Medical Dialogue	MAQuE	Accuracy60	14
Clinical Diagnostic Reasoning	Clinical Diagnostic Reasoning Benchmark 1.0 (test)	ICD Recall38.73	13
Medical Diagnosis	Dynamic Multi-turn Diagnostic Evaluation Chinese	Strict DA33.5	12

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord