LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
About
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | CheXpert (test) | AUC ROC88.07 | 48 | |
| Vessel segmentation | XCAV (test) | DSC82.47 | 24 | |
| Vessel segmentation | ARCADE-V (test) | DSC74.82 | 24 | |
| Vessel segmentation | CAXF (test) | DSC85.21 | 24 | |
| Classification | RSNA (test) | Accuracy72.75 | 24 | |
| Stenosis Segmentation | ARCADE-S (test) | DSC (%)39.34 | 23 | |
| Vessel Segment Segmentation | ARCADE-VS (test) | DSC48.09 | 23 | |
| Stenosis Detection | ARCADE 1.0 (test) | mAP5093.64 | 21 | |
| Few-shot Learning | MedFMC-ChestDR (test) | AUROC0.6744 | 15 | |
| Probe Guidance | EchoWorld 30 scans (test) | PLAX (Trans.)8.55 | 14 |