Large Language Models are Powerful Electronic Health Record Encoders

About

Electronic Health Records (EHRs) offer considerable potential for clinical prediction, but their complexity and heterogeneity challenge traditional machine learning. Domain-specific EHR foundation models trained on unlabeled EHR data have shown improved predictive accuracy and generalization. However, their development is constrained by limited data access and site-specific vocabularies. We convert EHR data into plain text by replacing medical codes with natural-language descriptions, enabling general-purpose Large Language Models (LLMs) to produce high-dimensional embeddings for downstream prediction tasks without access to private medical training data. LLM-based embeddings perform on par with a specialized EHR foundation model, CLMBR-T-Base, across 15 clinical tasks from the EHRSHOT benchmark. In an external validation using the UK Biobank, an LLM-based model shows statistically significant improvements for some tasks, which we attribute to higher vocabulary coverage and slightly better generalization. Overall, we reveal a trade-off between the computational efficiency of specialized EHR models and the portability and data independence of LLM-based embeddings.

Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, Benjamin Wild• 2025

Related benchmarks

Task	Dataset	Result
Chest X-ray Finding Prediction	EHRSHOT Chest X-ray Findings	AUROC0.616	20
Clinical prediction	EHRSHOT Chest X-ray Findings	AUPRC60.9	20
Operational Outcome Prediction	EHRSHOT Operational Outcomes	AUROC77.5	20
New Diagnosis Prediction	EHRSHOT Assignment of New Diagnoses	AUROC0.709	20
Clinical prediction	EHRSHOT Overall 1.0 (test)	AUROC69.9	20
Clinical prediction	EHRSHOT Assignment of New Diag.	AUPRC17.9	20
Clinical prediction	EHRSHOT Overall	AUPRC39.1	20
Clinical prediction	EHRSHOT Anticipating Labs	AUPRC64.9	20
Lab Result Prediction	EHRSHOT Anticipating Labs	AUROC0.657	20

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord