Echo-E$^3$Net: Efficient Endocardial Spatio-Temporal Network for Ejection Fraction Estimation
About
Objective To develop a robust and computationally efficient deep learning model for automated left ventricular ejection fraction (LVEF) estimation from echocardiography videos that is suitable for real-time point-of-care ultrasound (POCUS) deployment. Methods We propose Echo-E$^3$Net, an endocardial spatio-temporal network that explicitly incorporates cardiac anatomy into LVEF prediction. The model comprises a dual-phase Endocardial Border Detector (E$^2$CBD) that uses phase-specific cross attention to localize end-diastolic and end-systolic endocardial landmarks and to learn phase-aware landmark embeddings, and an Endocardial Feature Aggregator (E$^2$FA) that fuses these embeddings with global statistical descriptors of deep feature maps to refine EF regression. Training is guided by a multi-component loss inspired by Simpson's biplane method that jointly supervises EF and landmark geometry. We evaluate Echo-E$^3$Net on the EchoNet-Dynamic dataset using RMSE and R$^2$ while reporting parameter count and GFLOPs to characterize efficiency. Results On EchoNet-Dynamic, Echo-E$^3$Net achieves an RMSE of 5.20 and an R$^2$ score of 0.82 while using only 1.55M parameters and 8.05 GFLOPs. The model operates without external pre-training, heavy data augmentation, or test-time ensembling, supporting practical real-time deployment. Conclusion By combining phase-aware endocardial landmark modeling with lightweight spatio-temporal feature aggregation, Echo-E$^3$Net improves the efficiency and robustness of automated LVEF estimation and is well-suited for scalable clinical use in POCUS settings. Code is available at https://github.com/moeinheidari7829/Echo-E3Net
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LVEF estimation | EchoNet-Dynamic (test) | MAE3.93 | 20 |