Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

About

Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, noise-free conversations, or sequential exam making, etc., ignoring the interactive and uncertain nature of clinical diagnosis. In this paper, we aim to address this gap by formalizing clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) with three action types: questioning the patient, ordering medical exams as tool calls, and issuing a diagnosis. We also introduce a systematic noise model comprising seven patient noise types and three exam noise types. Using our proposed environment, we train an effective diagnosis agent, \textbf{MedExAgent}, through a two-stage pipeline that first performs supervised finetuning on synthetic conversations structured after the Calgary-Cambridge model for clinical interviews, and then applies DAPO to optimize a composite reward capturing diagnostic accuracy, tool call quality, and exam cost including financial cost and patient discomfort. Through extensive experiments and ablation studies, we demonstrate that MedExAgent achieves diagnostic performance comparable to larger models while maintaining cost-efficient examination strategies.

Yicheng Gao, Xiaolin Zhou, Yahan Li, Yue Zhao, Ruishan Liu• 2026

Related benchmarks

TaskDatasetResultRank
Medical DiagnosisAgentClinic OOD original (test)
Similarity (Sim)0.672
20
Medical DiagnosisDDXPlus
Similarity96.6
12
Medical DiagnosisPMC-Patients
Similarity Score62.6
12
Medical DiagnosisDDXPlus original (test)
Similarity Score0.953
8
Medical DiagnosisPMC-Patients original (test)
Similarity62.6
8
Showing 5 of 5 rows

Other info

Follow for update