Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

About

Recent advancements in general-purpose or domain-specific multimodal large language models (LLMs) have witnessed remarkable progress for medical decision-making. However, they are designated for specific classification or generative tasks, and require model training or finetuning on large-scale datasets with sizeable parameters and tremendous computing, hindering their clinical utility across diverse resource-constrained scenarios in practice. In this paper, we propose a novel and lightweight framework Med-MoE (Mixture-of-Experts) that tackles both discriminative and generative multimodal medical tasks. The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. After aligning multimodal medical images with LLM tokens, we then enable the model for different multimodal medical tasks with instruction tuning, together with a trainable router tailored for expert selection across input modalities. Finally, the model is tuned by integrating the router with multiple domain-specific experts, which are selectively activated and further empowered by meta expert. Comprehensive experiments on both open- and close-end medical question answering (Med-VQA) and image classification tasks across datasets such as VQA-RAD, SLAKE and Path-VQA demonstrate that our model can achieve performance superior to or on par with state-of-the-art baselines, while only requiring approximately 30\%-50\% of activated model parameters. Extensive analysis and ablations corroborate the effectiveness and practical utility of our method.

Songtao Jiang, Tuo Zheng, Yan Zhang, Yeying Jin, Li Yuan, Zuozhu Liu• 2024

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	SLAKE closed-end	Accuracy83.41	54
Medical Visual Question Answering	VQA-RAD closed-end	Accuracy80.07	45
Medical Visual Question Answering	PathVQA closed-end	Accuracy91.3	35
Medical Reasoning	breast ultrasound (test)	ROUGE Score61	5
Medical Reasoning	Brain MRI (test)	ROUGE66	5
Medical Reasoning	Lung CT-Scan (test)	ROUGE69	5
Medical Reasoning	Lung X-ray (test)	ROUGE61	5
Medical Reasoning	Polyp (test)	ROUGE0.67	5
Medical Reasoning	Skin Image (test)	ROUGE Score0.68	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord