EchoJEPA: A Latent Predictive Foundation Model for Echocardiography
About
Foundation models for echocardiography often struggle to disentangle anatomical signal from the stochastic speckle and acquisition artifacts inherent to ultrasound. We present EchoJEPA, a foundation model trained on 18 million echocardiograms across 300K patients, representing the largest pretraining corpus for this modality to date. By leveraging a latent predictive objective, EchoJEPA learns robust anatomical representations that ignore speckle noise. We validate this using a novel multi-view probing framework with frozen backbones, where EchoJEPA outperforms leading baselines by approximately 20% in left ventricular ejection fraction (LVEF) estimation and 17% in right ventricular systolic pressure (RVSP) estimation. The model also exhibits remarkable sample efficiency, reaching 79% view classification accuracy with only 1% of labeled data versus 42% for the best baseline trained on 100%. Crucially, EchoJEPA demonstrates superior generalization, degrading by only 2% under physics-informed acoustic perturbations compared to 17% for competitors. Most remarkably, its zero-shot performance on pediatric patients surpasses fully fine-tuned baselines, establishing latent prediction as a superior paradigm for robust, generalizable medical AI.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LVEF estimation | EchoNet-Pediatric | MAE3.88 | 17 | |
| LVEF estimation | Stanford | MAE (Original)3.97 | 5 | |
| LVEF estimation | Toronto (internal) | MAE4.26 | 5 | |
| LVEF estimation | Chicago (cross-site generalization) | MAE5.44 | 5 | |
| LVEF estimation | EchoNet-Dynamic Stanford | MAE3.97 | 5 | |
| RVSP estimation | Toronto | MAE4.54 | 5 | |
| RVSP estimation | Chicago | MAE4.91 | 5 |