Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
About
The professionalism of a human doctor in outpatient service depends on two core abilities: the ability to make accurate medical decisions and the medical consultation skill to conduct strategic, empathetic patient inquiry. Existing Large Language Models (LLMs) have achieved remarkable accuracy on medical decision-making benchmarks. However, they often lack the ability to conduct the strategic and empathetic consultation, which is essential for real-world clinical scenarios. To address this gap, we propose Doctor-R1, an AI doctor agent trained to master both of the capabilities by ask high-yield questions and conduct strategic multi-turn inquiry to guide decision-making. Our framework introduces three key components: a multi-agent interactive environment, a two-tiered reward architecture that separately optimizes clinical decision-making and communicative inquiry skills, and an experience repository to ground policy learning in high-quality prior trajectories. We evaluate Doctor-R1 on OpenAI's HealthBench and MAQuE, assessed across multi-facet metrics, such as communication quality, user experience, and task accuracy. Remarkably, Doctor-R1 surpasses state-of-the-art open-source specialized LLMs by a substantial margin with higher parameter efficiency and outperforms powerful proprietary models. Furthermore, the human expert evaluations show that Doctor-R1 achieves superior clinical capability and patient-centric performance, demonstrating the effectiveness of the framework.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | MedQA | Accuracy83.5 | 96 | |
| Question Answering | MMLU | Accuracy85 | 46 | |
| Medical Reasoning | HealthBench Hard | Accuracy24.27 | 41 | |
| Health-related dialogue and decision-making | HealthBench Main | Average Score36.29 | 22 | |
| Medical Dialogue | MAQuE | Accuracy60 | 14 | |
| Clinical Diagnostic Reasoning | Clinical Diagnostic Reasoning Benchmark 1.0 (test) | ICD Recall38.73 | 13 |