Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning

About

We present EHRMIND, a practical recipe for adapting large language models (LLMs) to complex clinical reasoning tasks using reinforcement learning with verifiable rewards (RLVR). While RLVR has succeeded in mathematics and coding, its application to healthcare contexts presents unique challenges due to the specialized knowledge and reasoning required for electronic health record (EHR) interpretation. Our pilot study on the MEDCALC benchmark reveals two key failure modes: (1) misapplied knowledge, where models possess relevant medical knowledge but apply it incorrectly, and (2) missing knowledge, where models lack essential domain knowledge. To address these cases, EHRMIND applies a two-stage solution: a lightweight supervised fine-tuning (SFT) warm-up that injects missing domain knowledge, stabilizes subsequent training, and encourages structured, interpretable outputs; followed by RLVR, which reinforces outcome correctness and refines the model's decision-making. We demonstrate the effectiveness of our method across diverse clinical applications, including medical calculations (MEDCALC), patient-trial matching (TREC CLINICAL TRIALS), and disease diagnosis (EHRSHOT). EHRMIND delivers consistent gains in accuracy, interpretability, and cross-task generalization. These findings offer practical guidance for applying RLVR to enhance LLM capabilities in healthcare settings.

Jiacheng Lin, Zhenbang Wu, Jimeng Sun• 2025

Related benchmarks

Task	Dataset	Result
Long Length of Stay	EHRSHOT Long Length of Stay	Accuracy69.41	6
Anemia prediction	EHRSHOT (test)	Accuracy44.57	6
30-day Readmission	EHRSHOT 30-day Readmission	Accuracy46.56	6
Acute Myocardial Infarction prediction	EHRSHOT (test)	Accuracy88.38	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord