Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints
About
Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our results identify retrieval quality and retrieval-inference alignment as key bottlenecks for deploying tabular in-context learning in clinical prediction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Diabetes | F1 Score86.6 | 33 | |
| Medical Image Classification | COVID-19 | F1-Score85.9 | 20 | |
| Classification | ILP | AUROC78.3 | 19 | |
| Regression | OXF-PT | Regression Metric0.04 | 19 | |
| Regression | TIT | Regression Metric1.001 | 19 | |
| Sepsis Prediction | MIMIC IV | AUROC0.918 | 19 | |
| Urinary tract infection (UTI) prediction | MIMIC IV | AUROC66.6 | 19 | |
| Ventilator-associated pneumonia (VAP) prediction | MIMIC IV | AUROC0.799 | 19 | |
| Classification | SUPPORT2 | AUROC98.4 | 19 | |
| Classification | DTC | AUROC0.99 | 19 |