Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Generalizable 3D Medical Image Representations from Mask-Guided Self-Supervision

About

Foundation models have transformed vision and language by learning general-purpose representations from large-scale unlabeled data, yet 3D medical imaging lacks analogous approaches. Existing self-supervised methods rely on low-level reconstruction or contrastive objectives that fail to capture the anatomical semantics critical for medical image analysis, limiting transfer to downstream tasks. We present MASS (MAsk-guided Self-Supervised learning), which treats in-context segmentation as the pretext task for learning general-purpose medical imaging representations. MASS's key insight is that automatically generated class-agnostic masks provide sufficient structural supervision for learning semantically rich representations. By training on thousands of diverse mask proposals spanning anatomical structures and pathological findings, MASS learns what semantically defines medical structures: the holistic combination of appearance, shape, spatial context, and anatomical relationships. We demonstrate effectiveness across data regimes: from small-scale pretraining on individual datasets (20-200 scans) to large-scale multi-modal pretraining on 5K CT, MRI, and PET volumes, all without annotations. MASS demonstrates: (i) few-shot segmentation on novel structures, (ii) matching full supervision with only 20-40\% labeled data while outperforming self-supervised baselines by over 20 in Dice score in low-data regimes, and (iii) frozen-encoder classification on unseen pathologies that matches full supervised training with thousands of samples. Mask-guided self-supervised pretraining captures broadly generalizable knowledge, opening a path toward 3D medical imaging foundation models without expert annotations. Code is available: https://github.com/Stanford-AIMI/MASS.

Yunhe Gao, Yabin Zhang, Chong Wang, Jiaming Liu, Maya Varma, Jean-Benoit Delbrouck, Akshay Chaudhari, Curtis Langlotz• 2026

Related benchmarks

TaskDatasetResultRank
SegmentationLiTS
Dice Score64.5
45
SegmentationACDC
DSC90
41
ClassificationLiver Trauma 27 (test)
AUC92.3
27
ClassificationSpleen Trauma 27 (test)
AUC90.6
27
ClassificationRSNA ICH 19 (test)
AUC81.5
27
ClassificationKidney Trauma 27 (test)
AUC90
27
SegmentationBCV
Dice Coefficient84.2
25
SegmentationSS H&N
Dice (%)78.9
25
SegmentationBraTS T1CE
Dice Score72.3
25
SegmentationAMOS MR
Dice85
25
Showing 10 of 17 rows

Other info

Follow for update