Exploring the Capabilities of Large Language Model Encoders for Image-Text Retrieval in Chest X-rays
About
Multimodal learning from paired medical images and clinical text is a central challenge in medical data-driven informatics, where effective cross-modal alignment is critical for scalable analysis and retrieval. In chest radiography, vision-language pretraining is constrained by heterogeneous radiology reports that contain abbreviations, impression-only notes, and institution-specific writing styles. Unlike general-domain settings, naively aggregating large collections of noisy reports can plateau or even degrade multimodal learning when reporting styles differ substantially. We propose a domain-adapted bidirectional large language model text encoder for chest radiograph reports, trained with masked token prediction and supervised contrastive learning on stylistically diverse but clinically equivalent report variants to produce robust, generalizable text embeddings. We then integrate this encoder into a dual-tower contrastive vision-language framework using parameter-efficient adaptation to improve image-text alignment. Across 1.6 million paired studies from public datasets and a de-identified hospital cohort, the proposed models improve bidirectional retrieval accuracy and external generalization, achieving GREEN scores of 0.308 on MIMIC-CXR and 0.618 on Open-I, while reducing the degradation observed when abbreviation-rich, impression-only hospital reports are added to training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unconditional Image-to-Report Retrieval | MIMIC-IR Chest X-Ray | Recall@598.9 | 15 | |
| Medical acronym understanding retrieval | Chest X-ray reports | Recall@161.1 | 11 | |
| Report error discrimination | Chest X-ray reports | Accuracy84.1 | 11 | |
| Report summarization retrieval | Chest X-ray reports | Recall@121.2 | 11 | |
| Clinical similarity matching | Chest X-ray reports | RadGraph0.402 | 11 | |
| Multimodal Image-Text Retrieval | Open-I 200 cases | Student Mean Rank1.85 | 6 | |
| Radiologist-reference ranking | Chest X-rays 72-case subset | Expert Mean Rank1.29 | 3 | |
| Radiologist-reference ranking | Chest X-rays 72-case (test) | Expert Mean Rank1.29 | 3 |