Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

About

In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis, thanks to the ability to effectively integrate multi-modal patient data. However, they often provide direct answers and draw empirical-driven conclusions without quantitative analysis, which reduces their reliability and clinical usability. We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine, to decouple the process into sequential components for step-by-step, evidence-based reasoning. Our MedAgent-Pro workflow presents a hierarchical diagnostic structure to mirror this principle, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning. To support disease-level planning, an RAG-based agent is designed to retrieve medical guidelines to ensure alignment with clinical standards. For patient-level reasoning, we propose to integrate professional tools such as visual models to enable quantitative assessments. Meanwhile, we propose to verify the reliability of each step to achieve evidence-based diagnosis, enforcing rigorous logical reasoning and a well-founded conclusion. Extensive experiments across a wide range of anatomical regions, imaging modalities, and diseases demonstrate the superiority of MedAgent-Pro to mainstream VLMs, agentic systems and state-of-the-art expert models. Ablation studies and human evaluation by clinical experts further validate its robustness and clinical relevance. Code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Ziyue Wang, Junde Wu, Linghan Cai, Chang Han Low, Xihong Yang, Qiaxuan Li, Yueming Jin• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringSlake
Accuracy69.4
134
Medical Visual Question AnsweringVQA-RAD
Accuracy63.3
106
Medical Visual Question AnsweringPathVQA
Overall Accuracy58.5
86
Showing 3 of 3 rows

Other info

Follow for update