Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

KG-CMI: Knowledge graph enhanced cross-Mamba interaction for medical visual question answering

About

Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent methods fail to fully leverage domain-specific medical knowledge, making it difficult to accurately associate lesion features in medical images with key diagnostic criteria. Additionally, classification-based approaches typically rely on predefined answer sets. Treating Med-VQA as a simple classification problem limits its ability to adapt to the diversity of free-form answers and may overlook detailed semantic information in those answers. To address these challenges, we propose a knowledge graph enhanced cross-Mamba interaction (KG-CMI) framework, which consists of a fine-grained cross-modal feature alignment (FCFA) module, a knowledge graph embedding (KGE) module, a cross-modal interaction representation (CMIR) module, and a free-form answer enhanced multi-task learning (FAMT) module. The KG-CMI learns cross-modal feature representations for images and texts by effectively integrating professional medical knowledge through a graph, establishing associations between lesion features and disease knowledge. Moreover, FAMT leverages auxiliary knowledge from open-ended questions, improving the model's capability for open-ended Med-VQA. Experimental results demonstrate that KG-CMI outperforms existing state-of-the-art methods on three Med-VQA datasets, i.e., VQA-RAD, SLAKE, and OVQA. Additionally, we conduct interpretability experiments to further validate the framework's effectiveness.

Xianyao Zheng, Hong Yu, Hui Cui, Changming Sun, Xiangyu Li, Ran Su, Leyi Wei, Jia Zhou, Junbo Wang, Qiangguo Jin• 2026

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringVQA-RAD
Accuracy78.21
198
Medical Visual Question AnsweringOVQA
Accuracy79.58
17
Medical Visual Question AnsweringSlake
Open Score82.78
10
Medical Visual Question AnsweringVQA-RAD
BLEU-10.438
3
Showing 4 of 4 rows

Other info

Follow for update