Toward Clinically Explainable AI for Medical Diagnosis: A Foundation Model with Human-Compatible Reasoning via Reinforcement Learning
About
The clinical adoption of artificial intelligence (AI) in medical diagnostics is critically hampered by its black-box nature, which prevents clinicians from verifying the rationale behind automated decisions. To overcome this fundamental barrier, we introduce DeepMedix-R1, a foundation model (FM) for chest X-ray (CXR) interpretation that generates not only accurate diagnoses but also a transparent, step-by-step reasoning process grounded in specific visual evidence. Our methodology employs a sequential training strategy, beginning with instruction fine-tuning, followed by a cold-start phase to elicit reasoning capabilities. Critically, we then implement reinforcement learning with grounded rewards to meticulously refine the model, aligning both its diagnostic outputs and its reasoning pathways with clinical plausibility. Quantitative assessments show that DeepMedix-R1 substantially outperforms advanced FMs, achieving improvements in report generation and visual question answering tasks. We also introduce Report Arena, a novel LLM-based benchmark that ranks DeepMedix-R1 first among competing models for output quality. Most significantly, a formal review by clinical experts reveals a profound preference for DeepMedix-R1's generated reasoning over the broadly adopted Qwen2.5-VL-7B model, confirming its superior interpretability and clinical utility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Radiology Report Generation | MIMIC-CXR findings | BLEU-410.5 | 26 | |
| Radiology Report Generation | RadVLM MIMIC-CXR (test) | ROUGE-L22.3 | 13 | |
| Medical Report Generation | MIMIC-CXR findings | BLEU-134.02 | 10 | |
| Medical Report Generation | MIMIC-CXR (impression) | BLEU-10.2135 | 10 | |
| Medical Report Generation | OPEN-I (findings) | BLEU-10.3958 | 10 | |
| Medical Report Generation | MIMIC-CXR & OPEN-I Weighted Average | BLEU-129.07 | 10 | |
| Visual Question Answering | CXR-VQA (test) | Presence92.68 | 10 | |
| Visual Question Answering | Ext-VQA (test) | Presence Accuracy78.94 | 10 | |
| Medical Report Generation | OPEN-I (impression) | BLEU-10.3426 | 10 | |
| Medical Diagnosis | CXR14 (external) | Precision for Edema62.94 | 10 |