Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

About

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringOmniMedVQA (test)
CT Accuracy70.3
29
Multimodal Dental Image AnalysisMMOral-Uni 1.0 (test)
Loc Score18
28
Open-ended VQAMMOral-OPG
Teeth Accuracy30.64
12
Text-only Question AnsweringPMC-MI-Bench
BLEU@411.8
10
Multi-choice Visual Question AnsweringPMC-MI-Bench
Accuracy88
10
Visual Question AnsweringPMC-MI-Bench (test)
BLEU@49.3
10
Visual Question AnsweringPMC-MI-Bench single-image
BLEU@49.8
10
Medical Visual Question AnsweringMMMU Med
BMS Score50
10
X-ray understandingMIMIC-CXR uncertain as positive
Micro F1 (14 classes)25.5
9
X-ray understandingMIMIC-CXR uncertain as negative
Micro F1 (14 classes)24.2
8
Showing 10 of 10 rows

Other info

Follow for update