Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

About

Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal language models (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other critic-free training methods like GRPO. Furthermore, with QoQ-Med trained on intensive segmentation data, it is able to highlight salient regions related to the diagnosis, with an IoU 10x higher than open models while reaching the performance of OpenAI o4-mini. To foster reproducibility and downstream research, we release (i) the full model weights, (ii) the modular training pipeline, and (iii) all intermediate reasoning traces at https://github.com/DDVD233/QoQ_Med.

Wei Dai, Peilin Chen, Chanakya Ekbote, Paul Pu Liang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringSlake
Accuracy68.5
239
Medical Visual Question AnsweringVQA-RAD
Accuracy74
198
Medical Visual Question AnsweringPathVQA
Accuracy63.5
50
Medical Visual Question AnsweringMedX-M
Accuracy25
18
Medical Visual Question AnsweringPMC
Accuracy51
18
Grounded ECG InterpretationECG-Grounding
Diagnosis Accuracy27.01
17
Ultrasound Question AnsweringU2-Bench disease-diagnosis
Accuracy41.46
16
Fundus readingFunBench
Accuracy62.5
14
Fundus readingOmni-Fundus
Accuracy57.9
14
Fundus readingGMAI-Fundus
Accuracy32.8
14
Showing 10 of 15 rows

Other info

Follow for update