Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models

About

Medical image analysis is essential to clinical diagnosis and treatment, which is increasingly supported by multi-modal large language models (MLLMs). However, previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information. This paper aims to advance 3D medical image analysis with MLLMs. To this end, we present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs specifically tailored for various 3D medical tasks, such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, we propose M3D-LaMed, a versatile multi-modal large language model for 3D medical image analysis. Furthermore, we introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks. Through comprehensive evaluation, our method proves to be a robust model for 3D medical image analysis, outperforming existing solutions. All code, data, and models are publicly available at: https://github.com/BAAI-DCAI/M3D.

Fan Bai, Yuxin Du, Tiejun Huang, Max Q.-H. Meng, Bo Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Multi-Modal Visual Question Answering (MMVQA)RAD-ChestCT (val)
Accuracy24.28
57
Multi-Modal Visual Question Answering (MMVQA)CT-RATE (val)
Accuracy22.68
57
Radiology Report GenerationCT-RATE (test)
BL-144.95
37
ClassificationCT-RATE
AUC0.807
29
CT Report GenerationCTRG-Chest-548K (test)
BLEU-430.86
28
Report GenerationCT-RATE
F1 Score0.27
26
ClassificationRad-ChestCT
AUC69.8
25
ClassificationCC-CCII
Accuracy83.8
24
SegmentationBraTS 2021
Dice (ET)45
18
Tumor analysisTumorCoT 1.5M (test)
Organ Position32.39
17
Showing 10 of 75 rows
...

Other info

Follow for update