Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

About

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows

Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do• 2026

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy56.7
346
Medical Visual Question AnsweringSlake
Accuracy78.5
239
Medical Visual Question AnsweringVQA-RAD
Accuracy64.3
198
Medical Question AnsweringPubMedQA
Accuracy75.6
92
Medical Visual Question AnsweringPMC-VQA
Accuracy56.1
74
Medical Visual Question AnsweringPathVQA
Accuracy56.5
50
Anomaly DetectionBr35H--
45
Medical Question AnsweringMedQA
Accuracy63.6
40
Image-level Anomaly DetectionHeadCT--
37
Medical Question AnsweringMedXpertQA
Accuracy16.5
31
Showing 10 of 19 rows

Other info

Follow for update