AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

About

Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.

Qiuhui Chen, Yi Hong• 2024

Related benchmarks

Task	Dataset	Result
Alzheimer's disease diagnosis	AD-MultiSense CN vs. MCI	Accuracy85.98	14
Alzheimer's disease diagnosis	AD-MultiSense CN vs. CI	Accuracy87.23	14
Treatment Planning	Internal Cohort	Accuracy81.6	12
Treatment Planning	External Cohort	Accuracy73.3	12
Three-way CN/MCI/AD classification	AD-MultiSense	Accuracy84.7	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord