CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings
About
Electronic Health Records (EHRs) contain rich, longitudinal patient information across structured (e.g., labs, vitals, and imaging) and unstructured (e.g., clinical notes) modalities. While deep learning models such as RNNs and Transformers have advanced single- and multimodal EHR analysis, existing methods often optimize for specific downstream tasks and overlook the creation of generalizable patient representations that can be reused across multiple tasks. To address this gap, we propose CAAT-EHR, a novel Cross-Attentional Autoregressive Transformer architecture that produces task-agnostic, longitudinal embeddings of multimodal EHR data. In CAAT-EHR, self-attention layers capture temporal dependencies within each modality, while cross-attention layers fuse information across modalities to model complex interrelationships. During pre-training, an autoregressive decoder predicts future time steps from the fused embeddings, enforcing temporal consistency and enriching the encoder output. Once trained, the encoder alone generates versatile multimodal EHR embeddings that can be applied directly to a variety of predictive tasks. CAAT-EHR demonstrates significant improvements on benchmark EHR datasets for mortality prediction, ICU length-of-stay estimation, and Alzheimer's disease diagnosis prediction. Models using EHR embeddings generated by CAAT-EHR outperform models trained on raw EHR data in eleven out of twelve comparisons for F1 score and AUC across all three downstream tasks. Ablation studies confirm the critical roles of cross-modality fusion and autoregressive refinement. Overall, CAAT-EHR provides a unified framework for learning generalizable, temporally consistent multimodal EHR representations that support more reliable clinical decision support systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Alzheimer's disease diagnosis | ADNI | AUC87.6 | 24 | |
| ICU length-of-stay prediction | MIMIC-III | F1 Score65.7 | 14 | |
| Mortality Prediction | MIMIC-III | F1 Score64.7 | 14 |