Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniMedVL: Unifying Medical Multimodal Understanding and Generation through Observation-Knowledge-Analysis

About

Medical workflows routinely combine reading images with producing visual and textual outputs, making both image understanding and generation central to medical AI. Most existing systems, however, address these abilities in isolated models, losing the shared knowledge that a unified architecture could exploit. To bridge this gap, we present UniMedVL, the first unified medical model that seamlessly integrates multimodal understanding and generation capabilities within a single model without switching weights. We achieve this via a tailored progressive training pipeline where understanding and generation mutually reinforce each other. To effectively train UniMedVL, we curate UniMedVL-5M, the first large-scale medical dataset comprising over 5.6M instances across 8 medical imaging modalities, tailored for multimodal input-output tasks in unified medical understanding and generation. Experimental results demonstrate that UniMedVL achieves competitive performance on five medical understanding benchmarks. Crucially, UniMedVL natively supports diverse interleaved generation tasks, e.g., virtual staining, super-resolution, cross-modal synthesis, essential for complex medical workflows. Our code and dataset are publicly available.

Junzhi Ning, Wei Li, Cheng Tang, Jiashi Lin, Chenglong Ma, Chaoyang Zhang, Jiyao Liu, Ying Chen, Shujian Gao, Yuandong Pu, Huihui Xu, Chenhui Gou, Ziyan Huang, Yi Xin, Qi Qin, Diping Song, Bin Fu, Guang Yang, Yuanfeng Ji, Tianbin Li, Yanzhou Su, Jin Ye, Shixiang Tang, Zhongying Deng, Lihao Liu, Ming Hu, Junjun He• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringVQA-RAD
Accuracy61.9
228
Medical Image SegmentationBUSI
Dice Score14.87
134
Medical Image SynthesisBraTS
SSIM81.89
108
Medical Image SegmentationGLAS
Dice52.86
106
Medical Report GenerationMIMIC-CXR (test)
ROUGE-L0.2727
100
Medical Visual Question AnsweringPathVQA
Accuracy53.5
80
Medical Image SegmentationISIC
DICE48.62
79
Medical Image SegmentationREFUGE
Dice Score0.5586
52
Medical Image SegmentationKvasir
mDice35.55
49
Medical Visual Question AnsweringOmniMedVQA
Accuracy85.8
48
Showing 10 of 78 rows
...

Other info

Follow for update