Large Language Models are Powerful Electronic Health Record Encoders
About
Electronic Health Records (EHRs) offer considerable potential for clinical prediction, but their complexity and heterogeneity challenge traditional machine learning. Domain-specific EHR foundation models trained on unlabeled EHR data have shown improved predictive accuracy and generalization. However, their development is constrained by limited data access and site-specific vocabularies. We convert EHR data into plain text by replacing medical codes with natural-language descriptions, enabling general-purpose Large Language Models (LLMs) to produce high-dimensional embeddings for downstream prediction tasks without access to private medical training data. LLM-based embeddings perform on par with a specialized EHR foundation model, CLMBR-T-Base, across 15 clinical tasks from the EHRSHOT benchmark. In an external validation using the UK Biobank, an LLM-based model shows statistically significant improvements for some tasks, which we attribute to higher vocabulary coverage and slightly better generalization. Overall, we reveal a trade-off between the computational efficiency of specialized EHR models and the portability and data independence of LLM-based embeddings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Chest X-ray Finding Prediction | EHRSHOT Chest X-ray Findings | AUROC0.616 | 20 | |
| Clinical prediction | EHRSHOT Chest X-ray Findings | AUPRC60.9 | 20 | |
| Operational Outcome Prediction | EHRSHOT Operational Outcomes | AUROC77.5 | 20 | |
| New Diagnosis Prediction | EHRSHOT Assignment of New Diagnoses | AUROC0.709 | 20 | |
| Clinical prediction | EHRSHOT Overall 1.0 (test) | AUROC69.9 | 20 | |
| Clinical prediction | EHRSHOT Assignment of New Diag. | AUPRC17.9 | 20 | |
| Clinical prediction | EHRSHOT Overall | AUPRC39.1 | 20 | |
| Clinical prediction | EHRSHOT Anticipating Labs | AUPRC64.9 | 20 | |
| Lab Result Prediction | EHRSHOT Anticipating Labs | AUROC0.657 | 20 |