Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Reinforcement for Open-ended Medical Reasoning via Semantic-Guided Reward Collapse Mitigation

About

Reinforcement learning (RL) with rule-based reward functions has recently shown great promise in enhancing the reasoning depth and generalization ability of vision-language models (VLMs), while maintaining computational efficiency. In spite of these advances, its adoption in medical imaging remains limited. Current reinforcement fine-tuning (RFT) efforts in this field mainly focus on closed-ended visual question answering (VQA), restricting their applicability to realistic clinical reasoning. However, open-ended medical VQA better mirrors clinical diagnostic workflows but remains underexplored. Although several studies have attempted to bridge the two formats through semantically guided RL, model-driven semantic rewards often suffer from reward collapse, where responses with distinct semantics yield nearly identical scores. To overcome this limitation, we introduce Adaptive Reinforcement for Medical Reasoning (ARMed), a novel RL framework tailored for open-ended medical VQA. ARMed first injects domain expertise through supervised fine-tuning (SFT) on chain-of-thought annotations, followed by reinforcement optimization using textual correctness and adaptive semantic rewards to refine reasoning consistency and factual accuracy. Extensive experiments on six challenging medical VQA benchmarks demonstrate that ARMed substantially improves both accuracy and generalization. These findings underscore the importance of reward discriminability in medical RL and highlight the potential of adaptive semantic rewards for building robust, clinically reliable multimodal reasoning systems.

Yizhou Liu, Dingkang Yang, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Jingwei Wei, Lihua Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringSLAKE (test)--
56
Medical Visual Question AnsweringPMC-VQA (test)
Accuracy48.75
36
Medical Visual Question AnsweringVQA-Med (test)
ROUGE-123.17
17
Medical Visual Question AnsweringMedXpert (test)
Accuracy22.3
12
Medical Visual Question AnsweringPath-VQA (test)
BLEU-163.61
12
Showing 5 of 5 rows

Other info

Follow for update