ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training

About

Learning transferable and interpretable representations from medical volumetric scans remains challenging due to complex anatomical structures and weak, heterogeneous supervision provided by radiology reports. In this paper, we propose Anatomy-aware Semantically-Adaptive Pre-training (ASAP), a principled vision-language pre-training framework for fine-grained medical volumetric representation learning from large-scale chest CT scans and their corresponding radiology reports. ASAP integrates three key components: (1) an anatomy-aware knowledge injection module that incorporates organ-level structural priors via off-the-shelf segmentation tool to encourage anatomically coherent representations; (2) a semantically-adaptive selective alignment mechanism that dynamically associates sentence-level findings with localized volumetric regions; and (3) a semantically-adaptive fusion module for effective interaction between anatomically informed visual features and grounded textual cues under dual-modal masked modeling paradigm. Beyond methodological contributions, we establish a comprehensive benchmark for medical volumetric vision-language pre-training on chest CT, covering 15 datasets and 22 downstream tasks spanning abnormality classification, segmentation, disease prognosis prediction, report generation, vocabulary classification, cross-modal retrieval and visual question answering. This benchmark provides standardized evaluation protocols to systematically assess representation quality under diverse clinical settings and data regimes. Extensive experiments demonstrate that ASAP consistently achieves state-of-the-art performance across tasks and datasets, with particularly pronounced gains under limited supervision and distribution shift, validating its effectiveness in learning transferable and clinically meaningful volumetric representations.

Rongsheng Wang, Fenghe Tang, Zihang Jiang, Yingtai Li, Xu Zhang, Haoran Lai, Wenxin Ma, Wei Wei, Zhiyang He, Xiaodong Tao, Rui Yan, Qingsong Yao, Shaohua Kevin Zhou• 2026

Related benchmarks

Task	Dataset	Result
Classification	CT-RATE	AUC0.841	71
Radiology Report Generation	CT-RATE (test)	BL-160	49
CT Report Generation	CTRG-Chest-548K (test)	BLEU-438.3	40
Medical Image Segmentation	COVID-19 CT (test)	DSC75.1	21
Segmentation	BTCV (100% labels)	Dice Coefficient81.7	20
Pulmonary Segmentation	LUNA16 100% labels	Dice Score93.4	20
Pulmonary Segmentation	LUNA16 (10% labels)	Dice Coefficient91.3	20
Abnormality Classification	CT-Rate AHPH-10K (test)	AUC69.8	18
Volumetric Segmentation	LUNA16 -> C19-CT (test)	Dice Score92.6	18
In-hospital mortality prediction	INSPECT	AUC84.4	13

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord