Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

About

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

Jiao Xu, Junwei Liu, Jiangwei Lao, Qi Zhu, Yunpeng Zhao, Congyun Jin, Shinan Liu, Zhihong Lu, Lihe Zhang, Xin Chen, Jian Wang, Ping Wang• 2026

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy71.3
253
Medical Visual Question AnsweringSlake
Accuracy85.6
134
Question AnsweringMedQA
Accuracy94.8
70
Multi-modal Question AnsweringMedXpertQA-MM
Accuracy36.7
27
Multi-modal Question AnsweringMMMU Health & Medicine
Accuracy0.694
12
Multi-modal Question AnsweringVQA-RAD
Accuracy87.1
12
Multi-modal Question AnsweringPMC-VQA
Accuracy70.3
12
Multi-modal Question AnsweringPathVQA
Accuracy64.9
12
Multi-modal Question AnsweringDermaVQA
Accuracy42
12
Text-only Question AnsweringMedXpertQA text
Accuracy29.8
12
Showing 10 of 11 rows

Other info

Follow for update