Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning

About

While reasoning-enhanced large language models perform strongly on English medical tasks, a persistent multilingual gap remains, with substantially weaker reasoning in local languages, limiting equitable global medical deployment. To bridge this gap, we introduce Med-CoReasoner, a language-informed co-reasoning framework that elicits parallel English and local-language reasoning, abstracts them into structured concepts, and integrates local clinical knowledge into an English logical scaffold via concept-level alignment and retrieval. This design combines the structural robustness of English reasoning with the practice-grounded expertise encoded in local languages. To evaluate multilingual medical reasoning beyond multiple-choice settings, we construct MultiMed-X, a benchmark covering seven languages with expert-annotated long-form question answering and natural language inference tasks, comprising 350 instances per language. Experiments across three benchmarks show that Med-CoReasoner improves multilingual reasoning performance by an average of 5%, with particularly substantial gains in low-resource languages. Moreover, model distillation and expert evaluation analysis further confirm that Med-CoReasoner produces clinically sound and culturally grounded reasoning traces.

Fan Gao, Sherry T. Tong, Jiwoong Sohn, Jiahao Huang, Junfeng Jiang, Ding Xia, Piyalitt Ittichaiwong, Kanyakorn Veerakanjana, Hyunjae Kim, Qingyu Chen, Edison Marrese Taylor, Kazuma Kobayashi, Akiko Aizawa, Irene Li• 2026

Related benchmarks

Task	Dataset	Result
Multiple-choice Question Answering	Global-MMLU Medical	Accuracy (ZH)89.1	17
Multiple-choice Question Answering	MMLU ProX Health	Accuracy (ZH)76.56	17
Medical Question Answering	MMedBench 1.0 (test)	Chinese Accuracy84.47	9
Long-form QA	MultiMed-X EN	Overall Score4.6	5
Long-form QA	MultiMed-X ZH	Overall Score4.53	5
Long-form QA	MultiMed-X JP	Overall Score4.43	5
Long-form QA	MultiMed-X KO	Overall Score4.54	5
Long-form QA	MultiMed-X SW	Overall Score4.55	5
Long-form QA	MultiMed-X TH	Overall Score4.66	5
Long-form QA	MultiMed-X YO	Overall Score4.45	5

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord