Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback

About

Existing Medical Large Vision-Language Models (Med-LVLMs), encapsulating extensive medical knowledge, demonstrate excellent capabilities in understanding medical images. However, there remain challenges in visual localization in medical images, which is crucial for abnormality detection and interpretation. To address these issues, we propose a novel UMed-LVLM designed to unveil medical abnormalities. Specifically, we collect a Medical Abnormalities Unveiling (MAU) dataset and propose a two-stage training method for UMed-LVLM training. To collect MAU dataset, we propose a prompt method utilizing the GPT-4V to generate diagnoses based on identified abnormal areas in medical images. Moreover, the two-stage training method includes Abnormal-Aware Instruction Tuning and Abnormal-Aware Rewarding, comprising Relevance Reward, Abnormal Localization Reward and Vision Relevance Reward. Experimental results demonstrate that our UMed-LVLM significantly outperforms existing Med-LVLMs in identifying and understanding medical abnormalities, achieving a 58% improvement over the baseline. In addition, this work shows that enhancing the abnormality detection capabilities of Med-LVLMs significantly improves their understanding of medical images and generalization capability.

Yucheng Zhou, Lingran Song, Jianbing Shen• 2025

Related benchmarks

TaskDatasetResultRank
Multiple-choice Visual Question AnsweringPMC-VQA (test)
Accuracy42.6
50
Medical Image ClassificationMedMNIST Derma (test)
Accuracy84.1
36
Medical Image ClassificationMedMNIST Breast (test)
Accuracy92.8
36
Medical Image ClassificationMedMNIST Pneumonia (test)
Accuracy95.8
36
Visual Question AnsweringVQA-RAD (test)
Open-ended Accuracy74.9
33
Medical DiagnosisMAU (test)
DL Score53
13
Fill-in-the-blank Visual Question AnsweringPMC-VQA (test)
Accuracy38.1
5
Showing 7 of 7 rows

Other info

Follow for update