Few-shot Adaptation of Medical Vision-Language Models

About

Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing medical foundation models and their zero-shot transfer to downstream tasks, the popular few-shot setting remains relatively unexplored. Following on from the currently strong emergence of this setting in computer vision, we introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime and investigate various adaptation strategies commonly used in the context of natural images. Furthermore, we evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings via learnable class-wise multipliers. Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies, while running considerably faster and accommodating the black-box setting. Our extensive experiments span three different medical modalities and specialized foundation models, nine downstream tasks, and several state-of-the-art few-shot adaptation methods. We made our benchmark and code publicly available to trigger further developments in this emergent subject: \url{https://github.com/FereshteShakeri/few-shot-MedVLMs}.

Fereshteh Shakeri, Yunshi Huang, Julio Silva-Rodr\'iguez, Houda Bahig, An Tang, Jose Dolz, Ismail Ben Ayed• 2024

Related benchmarks

Task	Dataset	Result
Surgical Phase Recognition	Cholec80	--	65
Surgical Phase Recognition	Autolaparo	Average F144.72	39
Surgical Phase Recognition	StrasBypass	Average F142.79	15
Surgical Phase Recognition	BernBypass	Average F129.74	15
Medical Image Classification	NIH CXR-14 (test)	AUC56.65	14
Medical Image Classification	COVIDx CXR-4 (test)	AUC55.31	14
Medical Image Classification	CheXpert Plus (test)	AUC51.87	14
Medical Image Classification	MIMIC-CXR (test)	AUC53.28	14
Medical Image Classification	RSNA Pneu. (test)	AUC50	14

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord