CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

About

A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report. We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports, respectively. Our model outperforms the state-of-the-art models trained under the same conditions. Also, enlarged dataset improve the discriminative power of our pre-trained model for classification, while sacrificing marginal retrieval performance. Code is available at https://github.com/kakaobrain/cxr-clip.

Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh• 2023

Related benchmarks

Task	Dataset	Result
Classification	SIIM	AUC94	67
Thoracic Disease Classification	MIMIC-CXR (test)	Average AUC66.19	34
Classification	VinDR-CXR	AUC0.89	24
Classification	RSNA	AUC89.8	24
Image-to-Text Retrieval	MIMIC-CXR (test)	R@11.71e+3	20
Image Classification	CheXpert 5x200 (test)	Accuracy35.9	19
Image-to-Text Retrieval	Open-i	--	17
Image-to-Text Retrieval	CheXpert 5X200	R@19.4	13
Image-to-Text Retrieval	MIMIC-CXR	R@121.6	13
Text-to-Image Retrieval	MIMIC-CXR (test)	R@11.69e+3	12

Showing 10 of 19 rows

Other info

Code

Follow for update

@wizwand_team Discord