Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

About

Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy55.1
346
Medical Visual Question AnsweringSlake
Accuracy84.9
239
Medical Visual Question AnsweringVQA-RAD
Accuracy64.3
198
Medical Question AnsweringPubMedQA
Accuracy74.8
92
Medical Visual Question AnsweringPMC-VQA
Accuracy55.6
74
Medical Visual Question AnsweringPathVQA
Accuracy62
50
Anomaly DetectionBr35H--
45
Medical Question AnsweringMedQA
Accuracy64.9
40
Image-level Anomaly DetectionHeadCT--
37
Medical Question AnsweringMedXpertQA
Accuracy16.9
31
Showing 10 of 33 rows

Other info

Follow for update