HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

About

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi• 2025

Related benchmarks

Task	Dataset	Result
Medical Visual Question Answering	VQA-RAD	Accuracy58.3	228
Medical Image Synthesis	BraTS	SSIM73.19	108
Medical Report Generation	MIMIC-CXR (test)	ROUGE-L0.214	100
Medical Visual Question Answering	PathVQA	--	92
Medical Visual Question Answering	PathVQA	Accuracy44.4	80
Medical Image Classification	DermaMNIST	Accuracy33.3	63
Open-ended VQA	MMOral-OPG	Teeth Accuracy30.64	55
Medical Visual Question Answering	OmniMedVQA (test)	CT Accuracy70.3	50
Medical Visual Question Answering	OmniMedVQA	Accuracy74.4	48
Medical Image Classification	Kvasir	Accuracy34.5	37

Showing 10 of 83 rows

...

Other info

Follow for update

@wizwand_team Discord