Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

About

Mixture-of-Experts (MoE) models typically fix the number of activated experts $k$ at both training and inference. However, real-world deployments often face heterogeneous hardware, fluctuating workloads, and diverse quality-latency requirements, while training separate models for each scenario is costly. Considering that MoE models already operate with sparse activation, adjusting the number of activated experts offers a natural path to serving diverse budgets with a single model. Yet, we find that activating more experts $k'$ ($> k$) at inference does not yield the expected gains. Instead, performance degrades rapidly after only a slight increase, a phenomenon we term the \textit{inference-time scaling wall}. Further investigation reveals that this degradation stems from a lack of learned collaboration among experts. To address this, we introduce \textbf{Elastic Mixture-of-Experts (EMoE)}, a novel training framework that enables MoE models to elastically vary the number of activated experts at inference. By simultaneously training experts to collaborate in diverse combinations and encouraging the router to make high-quality selections, EMoE ensures robust performance across inference budgets. Extensive experiments across four MoE architectures (7B--21B) and nine benchmarks show that EMoE significantly expands the effective scaling range to 2-3$\times$ the training-time $k$, while also achieving higher peak performance.

Naibin Gu, Zhenyu Zhang, Yuchen Feng, Yilong Chen, Peng Fu, Zheng Lin, Shuohuan Wang, Yu Sun, Hua Wu, Weiping Wang, Haifeng Wang• 2025

Related benchmarks

Task	Dataset	Result
General Knowledge	MMLU	MMLU General Knowledge Accuracy72.93	373
Reasoning	ARC Easy	Accuracy95.94	242
Commonsense Reasoning	Wino	Accuracy68.51	146
Reasoning	ARC-C	--	113
Code Generation	HumanE	Accuracy75.61	82
Open-domain Question Answering	NQ	Accuracy27.87	74
Question Answering	TriQA	Accuracy61.32	47
Commonsense Reasoning	HellaS	Accuracy84.54	44
General Language Understanding	Average	Average Accuracy72.93	26

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord