Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

About

A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report. We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports, respectively. Our model outperforms the state-of-the-art models trained under the same conditions. Also, enlarged dataset improve the discriminative power of our pre-trained model for classification, while sacrificing marginal retrieval performance. Code is available at https://github.com/kakaobrain/cxr-clip.

Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh• 2023

Related benchmarks

TaskDatasetResultRank
ClassificationSIIM
AUC94
54
Thoracic Disease ClassificationMIMIC-CXR (test)
Atelectasis AUC50
28
ClassificationVinDR-CXR
AUC0.89
24
ClassificationRSNA
AUC89.8
24
Image-to-Text RetrievalOpen-i--
17
Image-to-Text RetrievalCheXpert 5X200
R@19.4
13
Image-to-Text RetrievalMIMIC-CXR
R@121.6
13
Image ClassificationMIMIC 5x200 (test)
Accuracy49.7
9
Text-Image RetrievalMIMIC-CXR 5x200
mAP@160.2
9
Image-Text RetrievalMIMIC-CXR 5x200
mAP@151.8
9
Showing 10 of 18 rows

Other info

Code

Follow for update