Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

About

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at https://huggingface.co/collections/baichuan-inc/baichuan-m3.

Baichuan-M3 Team: Chengfeng Dou, Fan Yang, Fei Li, Jiyuan Jia, Qiang Ju, Shuai Wang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Hongda Zhang, Jinyang Tai, Linzhuang Sun, Peidong Guo, Yichuan Mo, Xiaochuan Wang, Hengfu Cui, Zhishou Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Medical ReasoningHealthBench--
36
Medical Test RecommendationMedR-Bench
Precision52
9
Medical DiagnosisMedR-Bench
Diagnostic Accuracy69
9
Medical DiagnosisMedAction 300-Hard
Diag. Acc.66
9
Medical Test RecommendationMedAction 300-Hard
Precision50
9
Hallucination SuppressionHealthBench Hallu
Refuted Rate2.45
4
Showing 6 of 6 rows

Other info

GitHub

Follow for update