Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback

About

Existing Medical Large Vision-Language Models (Med-LVLMs), encapsulating extensive medical knowledge, demonstrate excellent capabilities in understanding medical images. However, there remain challenges in visual localization in medical images, which is crucial for abnormality detection and interpretation. To address these issues, we propose a novel UMed-LVLM designed to unveil medical abnormalities. Specifically, we collect a Medical Abnormalities Unveiling (MAU) dataset and propose a two-stage training method for UMed-LVLM training. To collect MAU dataset, we propose a prompt method utilizing the GPT-4V to generate diagnoses based on identified abnormal areas in medical images. Moreover, the two-stage training method includes Abnormal-Aware Instruction Tuning and Abnormal-Aware Rewarding, comprising Relevance Reward, Abnormal Localization Reward and Vision Relevance Reward. Experimental results demonstrate that our UMed-LVLM significantly outperforms existing Med-LVLMs in identifying and understanding medical abnormalities, achieving a 58% improvement over the baseline. In addition, this work shows that enhancing the abnormality detection capabilities of Med-LVLMs significantly improves their understanding of medical images and generalization capability.

Yucheng Zhou, Lingran Song, Jianbing Shen• 2025

Related benchmarks

Task	Dataset	Result
Multiple-choice Visual Question Answering	PMC-VQA (test)	Accuracy42.6	50
Visual Question Answering	VQA-RAD (test)	--	48
Medical Image Classification	MedMNIST Derma (test)	Accuracy84.1	36
Medical Image Classification	MedMNIST Breast (test)	Accuracy92.8	36
Medical Image Classification	MedMNIST Pneumonia (test)	Accuracy95.8	36
Medical Diagnosis	MAU (test)	DL Score53	13
Fill-in-the-blank Visual Question Answering	PMC-VQA (test)	Accuracy38.1	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord