Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

About

Medical imaging provides critical evidence for clinical diagnosis, treatment planning, and surgical decisions, yet most existing imaging models are narrowly focused and require multiple specialized networks, limiting their generalization. Although large-scale language and multimodal models exhibit strong reasoning and multi-task capabilities, real-world clinical applications demand precise visual grounding, multimodal integration, and chain-of-thought reasoning. We introduce Citrus-V, a multimodal medical foundation model that combines image analysis with textual reasoning. The model integrates detection, segmentation, and multimodal chain-of-thought reasoning, enabling pixel-level lesion localization, structured report generation, and physician-like diagnostic inference in a single framework. We propose a novel multimodal training approach and release a curated open-source data suite covering reasoning, detection, segmentation, and document understanding tasks. Evaluations demonstrate that Citrus-V outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, delivering a unified pipeline from visual grounding to clinical reasoning and supporting precise lesion quantification, automated reporting, and reliable second opinions.

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy55.1
521
Medical Visual Question AnsweringSlake
Accuracy84.9
247
Medical Visual Question AnsweringVQA-RAD
Accuracy64.3
228
Medical Question AnsweringMedQA
Accuracy64.9
154
Medical Question AnsweringPubMedQA
Accuracy74.8
117
Medical Visual Question AnsweringPMC-VQA
Accuracy55.6
103
Medical Visual Question AnsweringPathVQA
Accuracy62
80
Anomaly DetectionBr35H--
45
Image-level Anomaly DetectionHeadCT--
37
Medical Question AnsweringMedXpertQA
Accuracy16.9
31
Showing 10 of 33 rows

Other info

Follow for update