A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation
About
Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Vascular Invasion Detection | Gastric Cancer Cohort H3 (external) | Mean AUC93.3 | 28 | |
| Coarse-grained lung cancer subtyping | Cohort Resection specimens (Internal) | Macro AUC0.955 | 10 | |
| Lymph Node Metastasis Prediction | LN Metastasis Resection Internal Cohort | Macro AUC97.5 | 10 | |
| NSCLC subtyping | TCGA Cohort Resection | Macro AUC (PulmoFoundation)97.7 | 6 | |
| Benign vs. Malignant Classification | Internal cohort Frozen (test) | Macro AUC98.6 | 5 | |
| Benign vs. Malignant Classification | External H10 cohort Frozen (test) | Macro AUC0.97 | 5 | |
| Benign vs. Malignant Classification | External H2 cohort Frozen (test) | Macro AUC0.999 | 5 | |
| Biomarker Prediction | CK5/6 Biopsy (Internal) | Macro AUC89.8 | 5 | |
| CK7 IHC status prediction | CK7 Internal | Macro AUC0.899 | 5 | |
| CK7 IHC status prediction | CK7 External H3 | Macro AUC0.979 | 5 |