PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

About

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

Jiao Xu, Junwei Liu, Jiangwei Lao, Qi Zhu, Yunpeng Zhao, Congyun Jin, Shinan Liu, Zhihong Lu, Lihe Zhang, Xin Chen, Jian Wang, Ping Wang• 2026

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy71.3	591
Medical Visual Question Answering	Slake	Accuracy85.6	289
Question Answering	MedQA	Accuracy94.8	96
Multi-modal Question Answering	MedXpertQA-MM	Accuracy36.7	38
Multi-modal Question Answering	MMMU Health & Medicine	Accuracy0.694	12
Multi-modal Question Answering	VQA-RAD	Accuracy87.1	12
Multi-modal Question Answering	PMC-VQA	Accuracy70.3	12
Multi-modal Question Answering	PathVQA	Accuracy64.9	12
Multi-modal Question Answering	DermaVQA	Accuracy42	12
Text-only Question Answering	MedXpertQA text	Accuracy29.8	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord