Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

About

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at https://huggingface.co/collections/baichuan-inc/baichuan-m3.

Baichuan-M3 Team: Chengfeng Dou, Fan Yang, Fei Li, Jiyuan Jia, Qiang Ju, Shuai Wang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Hongda Zhang, Jinyang Tai, Linzhuang Sun, Peidong Guo, Yichuan Mo, Xiaochuan Wang, Hengfu Cui, Zhishou Zhang• 2026

Related benchmarks

Task	Dataset	Result
Medical Reasoning	HealthBench	--	36
Microbiota-to-metabolites feature expansion	CAG-Tongue supplementary expansion task (40/60 collected-uncollected split)	MSE0.1341	11
Intra-cohort feature expansion	CAG-Tongue (intra-cohort)	MSE1.45e+5	11
Medical Test Recommendation	MedR-Bench	Precision52	9
Medical Diagnosis	MedR-Bench	Diagnostic Accuracy69	9
Medical Diagnosis	MedAction 300-Hard	Diag. Acc.66	9
Medical Test Recommendation	MedAction 300-Hard	Precision50	9
Hallucination Suppression	HealthBench Hallu	Refuted Rate2.45	4

Showing 8 of 8 rows

Other info

GitHub

Follow for update

@wizwand_team Discord