MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

About

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows

Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do• 2026

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy56.7	591
Medical Visual Question Answering	Slake	Accuracy78.5	289
Medical Visual Question Answering	VQA-RAD	Accuracy64.3	251
Medical Question Answering	MedQA	Accuracy63.6	179
Medical Question Answering	PubMedQA	Accuracy75.6	122
Medical Visual Question Answering	PMC-VQA	Accuracy56.1	103
Medical Visual Question Answering	PathVQA	Accuracy56.5	103
Anomaly Detection	Br35H	--	45
Image-level Anomaly Detection	HeadCT	--	37
Medical Question Answering	MedXpertQA	Accuracy16.5	31

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord