MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

About

In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis, thanks to the ability to effectively integrate multi-modal patient data. However, they often provide direct answers and draw empirical-driven conclusions without quantitative analysis, which reduces their reliability and clinical usability. We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine, to decouple the process into sequential components for step-by-step, evidence-based reasoning. Our MedAgent-Pro workflow presents a hierarchical diagnostic structure to mirror this principle, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning. To support disease-level planning, an RAG-based agent is designed to retrieve medical guidelines to ensure alignment with clinical standards. For patient-level reasoning, we propose to integrate professional tools such as visual models to enable quantitative assessments. Meanwhile, we propose to verify the reliability of each step to achieve evidence-based diagnosis, enforcing rigorous logical reasoning and a well-founded conclusion. Extensive experiments across a wide range of anatomical regions, imaging modalities, and diseases demonstrate the superiority of MedAgent-Pro to mainstream VLMs, agentic systems and state-of-the-art expert models. Ablation studies and human evaluation by clinical experts further validate its robustness and clinical relevance. Code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Ziyue Wang, Junde Wu, Linghan Cai, Chang Han Low, Xihong Yang, Qiaxuan Li, Yueming Jin• 2025

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	Slake	Accuracy69.4	247
Medical Visual Question Answering	VQA-RAD	Accuracy63.3	228
Medical Visual Question Answering	PathVQA	Overall Accuracy58.5	92
Skin lesion classification	HAM10000	Accuracy57.63	20
Heart Disease Diagnosis	MITEA	Balanced Accuracy77.8	11
Concept Annotation	Derm7pt	F1-Macro64.82	11
Glaucoma Diagnosis	REFUGE2	BACC90.4	11
Dermatological diagnosis	SNU	Accuracy11.6	11
Concept Annotation	SkinCon	F1-Macro18.34	11
Clinical Captioning	SkinCAP	ROUGE-L11.48	11

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord