Med-Flamingo: a Multimodal Medical Few-shot Learner

About

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.

Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Cyril Zakka, Yash Dalmia, Eduardo Pontes Reis, Pranav Rajpurkar, Jure Leskovec• 2023

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	Slake	Accuracy43.5	247
Medical Visual Question Answering	VQA-RAD	Accuracy45.4	228
Medical Visual Question Answering	PMC-VQA	Accuracy23.3	103
Medical Visual Question Answering	PathVQA	--	92
Medical Visual Question Answering	PathVQA	Accuracy31.3	80
Medical Visual Question Answering	SLAKE (test)	--	67
Medical Visual Question Answering	PathVQA (test)	Accuracy47.9	55
Medical Visual Question Answering	OmniMedVQA (test)	CT Accuracy38.5	50
Medical Visual Question Answering	VQA-RAD (test)	--	50
Medical Visual Question Answering	OmniMedVQA	Accuracy34.9	48

Showing 10 of 53 rows

Other info

Follow for update

@wizwand_team Discord