Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

About

Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propose a context-infused vision encoder that models how radiologists integrate clinical context -- including patient history, symptoms, and diagnostic intent -- to guide diagnostic reasoning. We then present a multi-level supervision paradigm that (1) enforces intra- and inter-modal semantic alignment through hybrid-positive contrastive learning, (2) injects diagnostic priors via disease-aware cross-modal representation learning, and (3) leverages radiologists' gaze as probabilistic priors to guide attention toward diagnostically salient regions. Extensive experiments demonstrate that CoGaze consistently outperforms state-of-the-art methods across diverse tasks, achieving up to +2.0% CheXbertF1 and +1.2% BLEU2 for free-text and structured report generation, +23.2% AUROC for zero-shot classification, and +12.2% Precision@1 for image-text retrieval. Code is available at https://github.com/mk-runner/CoGaze.

Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao• 2026

Related benchmarks

Task	Dataset	Result
Radiology Report Generation	MIMIC-CXR (test)	ROUGE-L0.326	209
Classification	SIIM	AUC97.4	67
Chest X-ray classification	NIH (test)	AUROC86.1	47
Classification	RSNA (test)	F1 Score84.8	44
Image Classification	SIIM (test)	F1 Score97.4	30
Lesion Segmentation	RSNA 56	Dice Score80.22	12
Lesion Segmentation	TBX11K 42	Dice96.56	12
Classification	Shenzhen 21 (test)	F1 Score81.3	9
Classification	RSNA 56 (test)	F1 Score77	9
Structured report generation	SRRG-Findings (test)	BLEU3	4

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord