MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence
About
Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy56.7 | 346 | |
| Medical Visual Question Answering | Slake | Accuracy78.5 | 239 | |
| Medical Visual Question Answering | VQA-RAD | Accuracy64.3 | 198 | |
| Medical Question Answering | PubMedQA | Accuracy75.6 | 92 | |
| Medical Visual Question Answering | PMC-VQA | Accuracy56.1 | 74 | |
| Medical Visual Question Answering | PathVQA | Accuracy56.5 | 50 | |
| Anomaly Detection | Br35H | -- | 45 | |
| Medical Question Answering | MedQA | Accuracy63.6 | 40 | |
| Image-level Anomaly Detection | HeadCT | -- | 37 | |
| Medical Question Answering | MedXpertQA | Accuracy16.5 | 31 |