Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

About

Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.

Qiuhui Chen, Yi Hong• 2024

Related benchmarks

TaskDatasetResultRank
Alzheimer's disease diagnosisAD-MultiSense CN vs. MCI
Accuracy85.98
14
Alzheimer's disease diagnosisAD-MultiSense CN vs. CI
Accuracy87.23
14
Treatment PlanningInternal Cohort
Accuracy81.6
12
Treatment PlanningExternal Cohort
Accuracy73.3
12
Three-way CN/MCI/AD classificationAD-MultiSense
Accuracy84.7
3
Showing 5 of 5 rows

Other info

Follow for update