Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

About

Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propose a context-infused vision encoder that models how radiologists integrate clinical context -- including patient history, symptoms, and diagnostic intent -- to guide diagnostic reasoning. We then present a multi-level supervision paradigm that (1) enforces intra- and inter-modal semantic alignment through hybrid-positive contrastive learning, (2) injects diagnostic priors via disease-aware cross-modal representation learning, and (3) leverages radiologists' gaze as probabilistic priors to guide attention toward diagnostically salient regions. Extensive experiments demonstrate that CoGaze consistently outperforms state-of-the-art methods across diverse tasks, achieving up to +2.0% CheXbertF1 and +1.2% BLEU2 for free-text and structured report generation, +23.2% AUROC for zero-shot classification, and +12.2% Precision@1 for image-text retrieval. Code is available at https://github.com/mk-runner/CoGaze.

Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao• 2026

Related benchmarks

TaskDatasetResultRank
Radiology Report GenerationMIMIC-CXR (test)
BLEU-40.175
172
ClassificationSIIM
AUC97.4
56
Chest X-ray classificationNIH (test)
AUROC86.1
47
ClassificationRSNA (test)
F1 Score84.8
44
Image ClassificationSIIM (test)
F1 Score97.4
30
Lesion SegmentationRSNA 56
Dice Score80.22
12
Lesion SegmentationTBX11K 42
Dice96.56
12
ClassificationShenzhen 21 (test)
F1 Score81.3
9
ClassificationRSNA 56 (test)
F1 Score77
9
Structured report generationSRRG-Findings (test)
BLEU3
4
Showing 10 of 13 rows

Other info

Follow for update