Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

About

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually expand their capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. Recent methods leverage sparse expert routing to promote task specialization, but we find that the expert routing process suffers from drift as the data distribution evolves. For example, a grounding query that previously activated localization experts may instead be routed to irrelevant experts after learning OCR tasks. Meanwhile, the grounding-related experts can be overwritten by new tasks and lose their original functionality. Such failure reflects two problems: router drift, where expert selection becomes inconsistent over time, and expert drift, where shared experts are overwritten across tasks. Therefore, we propose StAbilized Mixture-of-Experts (SAME) for MCIT. To address router drift, SAME stabilizes expert selection by decomposing routing dynamics into orthogonal subspaces and updating only task-relevant directions. To mitigate expert drift, we regulate expert updates via curvature-aware scaling using historical input covariance in a rehearsal-free manner. SAME also introduces adaptive expert activation to freeze selected experts during training, reducing redundant computation and cross-task interference. We also introduce a new benchmark to evaluate MCIT with long task sequence, and extensive experiments demonstrate SAME's SOTA performance. Code is available at https://github.com/LAMDA-CL/Prism.

Zhen-Hao Xie, Jun-Tao Tang, Yu-Cheng Shi, Han-Jia Ye, De-Chuan Zhan, Da-Wei Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Continual Instruction TuningUCIT
Image-R Score89.91
30
Multimodal Continual Instruction TuningUCIT (Unified Continual Instruction Tuning)
ImgNet-R Score89.91
28
Multimodal Continual Instruction TuningTriGap v1 (test)
PMCVQA Score41.6
10
Multimodal Continual Instruction TuningTriGap
PMCVQA41.6
10
Visual Question AnsweringTriGap
PMCVQA Score41.6
9
Showing 5 of 5 rows

Other info

Follow for update