MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration

About

Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the participation of the pretraining objective. We evaluate the proposed pretraining method on three medical imaging scenarios in multiple vision-language downstream tasks, and compare it with several curriculum learning methods. Extensive experiments show that our method significantly surpasses all baselines. https://github.com/Mr-Talon/MedKCO.

Chenran Zhang, Ruiqi Wu, Tao Zhou, Yi Zhou• 2026

Related benchmarks

Task	Dataset	Result
Radiology Report Generation	MIMIC-CXR	ROUGE-L24.7	57
Image Classification	CheXpert 5X200	Accuracy54.8	28
Image-to-Text Retrieval	MIMIC-CXR (test)	R@19.1	20
Medical Report Generation	Open-i	CIDEr0.33	17
Image-to-Text Retrieval	OPENI (test)	--	9
Classification	ODIR 200x3 CFP Modality	Accuracy86.3	8
Classification	REFUGE CFP Modality	Accuracy94.7	8
Classification	FIVES CFP Modality	AUC72.9	8
Classification	OCTID OCT Modality	Accuracy77.8	8
Classification	OCTDL OCT Modality	Accuracy42	8

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord