MedMIX: Modality-Internal Expert Fusion for Multimodal Medical Diagnosis
About
Multimodal clinical prediction faces three challenges: multiple foundation models (FMs) with complementary strengths per modality, pervasive missing modalities at training and test time, and sample-specific variation in modality contributions. We introduce MedMIX, a multimodal framework that combines intra-modality expert fusion, learned inter-modality fusion, and training-only large--small model collaboration for robust medical prediction under incomplete modalities. Within each modality, MedMIX aggregates complementary embeddings from multiple small expert models; across modalities, it performs learned fusion over available modalities; and during training, it leverages large teacher models to improve deployed representations without additional inference cost. Across three heterogeneous benchmarks (OpenI, MIMIC-IV-MM, and MMIST-ccRCC), MedMIX achieves consistently strong performance while remaining robust under controlled missing-modality perturbations, and further demonstrates sustained robustness under cross-cohort shift on MIMIC-III. These results highlight MedMIX as a practical framework that unifies within-modality expert collaboration, sample-specific cross-modality fusion, and efficient large--small model collaboration while remaining robust to incomplete modalities.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Disease Diagnosis | Open-i | Accuracy94.63 | 41 | |
| Multimodal Medical Classification | MIMIC-IV-MM | AUROC71.68 | 11 | |
| Multimodal Medical Classification | MMIST-ccRCC | AUROC81.21 | 11 | |
| Medical Classification | MIMIC-III (external val) | AUROC0.5718 | 8 |