FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

About

Real-world model deployment across multiple domains requires multimodal models to operate under two complementary regimes: (1) multi-task pretraining, tasks are co-available at design time where related tasks could borrow representational strength from one another, (2) continual adaptation, in which new tasks emerge after deployment with previously unseen modality combinations. However, neither regime alone suffices: the pretraining task set is never exhaustive, while bypassing joint training forfeits the transfer gains and efficiency among co-trainable tasks. Sparse Mixture-of-Experts (MoE) is a natural fit for this dual requirement: sparse activation enables modular capacity expansion as new tasks arrive, while routing decouples modality-level computation from task-level composition. In this work, we propose a scalable MoE framework for multitask pretraining and continual learning across flexible modality combinations. The framework is designed to support training on multimodal tasks with diverse modality configurations by leveraging modality-specific routers that process tokens from each modality across tasks. Furthermore, it enables continual learning over sequential multimodal tasks within a fixed-capacity MoE by compressing accumulated expert knowledge into low-rank memory subspaces, while expanding only the lightweight routers. We validate the effectiveness of our method on multiple healthcare multimodal benchmarks. It demonstrates competitive multitask pretraining performance while alleviating catastrophic forgetting and improving parameter efficiency.

Xing Han, Shravan Chaudhari, Tanvi Ranade, Rama Chellappa, Suchi Saria• 2026

Related benchmarks

Task	Dataset	Result
Alzheimer's disease diagnosis	ADNI	AUC78.8	60
Mortality Prediction	eICU	AUC-PRC0.293	53
Phenotype prediction	MIMIC IV	AUROC72.3	36
48-hour In-Hospital Mortality (48-IHM)	MIMIC IV	AUC81.7	24
Readmission prediction	eICU	AUC-ROC0.758	15
BIRADS classification	EMBED	AUROC0.813	6
Density Classification	EMBED	AUROC92.7	6
Length of Stay (LOS)	MIMIC IV	AUROC82.1	6
Risk Assessment	EMBED	AUROC73.8	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord