Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

About

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification.

Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie• 2023

Related benchmarks

TaskDatasetResultRank
Multiple-choice Visual Question AnsweringPMC-VQA (test)
Accuracy24.7
50
Visual Question AnsweringVQA-RAD
Closed Accuracy84
49
Anatomy-conditioned Image RetrievalMIMIC-IR official (test)
Recall@316.83
44
Medical Image ClassificationMedMNIST Derma (test)
Accuracy79.8
36
Medical Image ClassificationMedMNIST Pneumonia (test)
Accuracy95.4
36
Medical Image ClassificationMedMNIST Breast (test)
Accuracy91.4
36
Visual Question AnsweringVQA-RAD (test)
Open-ended Accuracy67
33
Medical Image ClassificationRetinaMNIST v2 (test)
Accuracy52.2
33
ClassificationPneumoniaMNIST MedMNIST v2 (test)
Accuracy84.5
32
Visual Question AnsweringSlake
Closed Accuracy88
27
Showing 10 of 68 rows

Other info

Follow for update