Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning
About
While reasoning-enhanced large language models perform strongly on English medical tasks, a persistent multilingual gap remains, with substantially weaker reasoning in local languages, limiting equitable global medical deployment. To bridge this gap, we introduce Med-CoReasoner, a language-informed co-reasoning framework that elicits parallel English and local-language reasoning, abstracts them into structured concepts, and integrates local clinical knowledge into an English logical scaffold via concept-level alignment and retrieval. This design combines the structural robustness of English reasoning with the practice-grounded expertise encoded in local languages. To evaluate multilingual medical reasoning beyond multiple-choice settings, we construct MultiMed-X, a benchmark covering seven languages with expert-annotated long-form question answering and natural language inference tasks, comprising 350 instances per language. Experiments across three benchmarks show that Med-CoReasoner improves multilingual reasoning performance by an average of 5%, with particularly substantial gains in low-resource languages. Moreover, model distillation and expert evaluation analysis further confirm that Med-CoReasoner produces clinically sound and culturally grounded reasoning traces.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiple-choice Question Answering | Global-MMLU Medical | Accuracy (ZH)89.1 | 17 | |
| Multiple-choice Question Answering | MMLU ProX Health | Accuracy (ZH)76.56 | 17 | |
| Medical Question Answering | MMedBench 1.0 (test) | Chinese Accuracy84.47 | 9 | |
| Long-form QA | MultiMed-X EN | Overall Score4.6 | 5 | |
| Long-form QA | MultiMed-X ZH | Overall Score4.53 | 5 | |
| Long-form QA | MultiMed-X JP | Overall Score4.43 | 5 | |
| Long-form QA | MultiMed-X KO | Overall Score4.54 | 5 | |
| Long-form QA | MultiMed-X SW | Overall Score4.55 | 5 | |
| Long-form QA | MultiMed-X TH | Overall Score4.66 | 5 | |
| Long-form QA | MultiMed-X YO | Overall Score4.45 | 5 |