Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
About
Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce three new models: Virchow2, a 632 million parameter vision transformer, Virchow2G, a 1.9 billion parameter vision transformer, and Virchow2G Mini, a 22 million parameter distillation of Virchow2G, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Survival Prediction | TCGA-LUAD | C-index0.612 | 116 | |
| WSI Classification | NTUH-Ki67-Liver (5-fold cross-val) | Balanced Acc92.2 | 98 | |
| Survival Prediction | TCGA-COADREAD | C-index62.5 | 67 | |
| Survival Prediction | TCGA-STAD | C-index0.605 | 52 | |
| Slide-level classification | Camelyon16 | -- | 52 | |
| Survival Prediction | KIRC TCGA | C-Index0.694 | 50 | |
| Cancer Subtyping | cohort of lung cancer H1 (internal) | Mean AUC0.97 | 46 | |
| Few-shot classification | CPTAC k=25 positive samples (test) | LUNG ST Accuracy83.9 | 45 | |
| Few-shot classification | CPTAC | LUNG ST Accuracy76.5 | 45 | |
| Survival Prediction | TCGA-BRCA (test) | Concordance Index (CI)0.622 | 41 |