AIMER: Calibration-Free Task-Agnostic MoE Pruning

About

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, which makes pruning outcomes sensitive to the choice of calibration set and adds substantial preprocessing cost. We introduce AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that yields clear within-layer score separation and distinct expert stratification. Across 7B to 30B MoE language models at 25\% and 50\% pruning ratios over 16 benchmarks, AIMER consistently delivers competitive or stronger overall performance against state-of-the-art calibration-based expert pruning baselines with only 0.22--1.27 seconds for scoring the experts.

Zongfang Liu, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy83	1398
Creative Writing	WildBench	WildBench Score40.2	49
Multiple-Choice QA	Multiple-Choice Suite	MC Avg0.681	49
Multiple-choice Question Answering	MC (test)	MC Avg72.3	46
Code Generation	Coding Eval+ LiveCode (test)	Eval+ Score84.5	32
Code Generation	EvalPlus (test)	Eval+73.4	23
Mathematical Reasoning	MATH 500	MATH-500 Score69.2	23
Creative Writing	WildBench (test)	WildBench Score60.4	15
Math	GSM8K MATH-500 (test)	GSM8K Accuracy89.5	15
Expert Pruning Efficiency	MoE Models	Calibration Time (h)0.22	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord