Vision Foundation Models for Computed Tomography
About
Foundation models (FMs) have shown transformative potential in radiology by performing diverse, complex tasks across imaging modalities. Here, we developed CT-FM, a large-scale 3D image-based pre-trained model designed explicitly for various radiological tasks. CT-FM was pre-trained using 148,000 computed tomography (CT) scans from the Imaging Data Commons through label-agnostic contrastive learning. We evaluated CT-FM across four categories of tasks, namely, whole-body and tumor segmentation, head CT triage, medical image retrieval, and semantic understanding, showing superior performance against state-of-the-art models. Beyond quantitative success, CT-FM demonstrated the ability to cluster regions anatomically and identify similar anatomical and structural concepts across scans. Furthermore, it remained robust across test-retest settings and indicated reasonable salient regions attached to its embeddings. This study demonstrates the value of large-scale medical imaging foundation models and by open-sourcing the model weights, code, and data, aims to support more adaptable, reliable, and interpretable AI solutions in radiology.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Image Segmentation | MSD Pancreas (test) | DSC81.8 | 30 | |
| Multi-label Abnormality Analysis | CT-RATE (test) | AUROC0.7435 | 24 | |
| 3D Segmentation | MSD-Liver | DSC93.4 | 15 | |
| Multi-label abnormality classification | RAD-ChestCT (test) | AUROC0.6326 | 14 | |
| Visual Segmentation | KiTS23 | KTC Dice Score0.969 | 14 | |
| Medical Image Segmentation | MSD Lung (test) | Dice (Label 1)70.1 | 6 | |
| 3D Organ Segmentation | WORD (test) | DSC (Liver)0.965 | 5 | |
| 3D Segmentation | AutoPET II | DSC32.6 | 5 | |
| Tumor phenotype retrieval | NSCLC Radiogenomics | Recall@150.7 | 4 | |
| Tumor phenotype retrieval | C4KC-KiTS | Recall@164.3 | 4 |