ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training
About
Learning transferable and interpretable representations from medical volumetric scans remains challenging due to complex anatomical structures and weak, heterogeneous supervision provided by radiology reports. In this paper, we propose Anatomy-aware Semantically-Adaptive Pre-training (ASAP), a principled vision-language pre-training framework for fine-grained medical volumetric representation learning from large-scale chest CT scans and their corresponding radiology reports. ASAP integrates three key components: (1) an anatomy-aware knowledge injection module that incorporates organ-level structural priors via off-the-shelf segmentation tool to encourage anatomically coherent representations; (2) a semantically-adaptive selective alignment mechanism that dynamically associates sentence-level findings with localized volumetric regions; and (3) a semantically-adaptive fusion module for effective interaction between anatomically informed visual features and grounded textual cues under dual-modal masked modeling paradigm. Beyond methodological contributions, we establish a comprehensive benchmark for medical volumetric vision-language pre-training on chest CT, covering 15 datasets and 22 downstream tasks spanning abnormality classification, segmentation, disease prognosis prediction, report generation, vocabulary classification, cross-modal retrieval and visual question answering. This benchmark provides standardized evaluation protocols to systematically assess representation quality under diverse clinical settings and data regimes. Extensive experiments demonstrate that ASAP consistently achieves state-of-the-art performance across tasks and datasets, with particularly pronounced gains under limited supervision and distribution shift, validating its effectiveness in learning transferable and clinically meaningful volumetric representations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | CT-RATE | AUC0.841 | 71 | |
| Radiology Report Generation | CT-RATE (test) | BL-160 | 49 | |
| CT Report Generation | CTRG-Chest-548K (test) | BLEU-438.3 | 40 | |
| Medical Image Segmentation | COVID-19 CT (test) | DSC75.1 | 21 | |
| Segmentation | BTCV (100% labels) | Dice Coefficient81.7 | 20 | |
| Pulmonary Segmentation | LUNA16 100% labels | Dice Score93.4 | 20 | |
| Pulmonary Segmentation | LUNA16 (10% labels) | Dice Coefficient91.3 | 20 | |
| Abnormality Classification | CT-Rate AHPH-10K (test) | AUC69.8 | 18 | |
| Volumetric Segmentation | LUNA16 -> C19-CT (test) | Dice Score92.6 | 18 | |
| In-hospital mortality prediction | INSPECT | AUC84.4 | 13 |