AIMER: Calibration-Free Task-Agnostic MoE Pruning
About
Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, which makes pruning outcomes sensitive to the choice of calibration set and adds substantial preprocessing cost. We introduce AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that yields clear within-layer score separation and distinct expert stratification. Across 7B to 30B MoE language models at 25\% and 50\% pruning ratios over 16 benchmarks, AIMER consistently delivers competitive or stronger overall performance against state-of-the-art calibration-based expert pruning baselines with only 0.22--1.27 seconds for scoring the experts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy83 | 1362 | |
| Multiple-Choice QA | Multiple-Choice Suite | MC Avg0.681 | 49 | |
| Multiple-choice Question Answering | MC (test) | MC Avg72.3 | 46 | |
| Creative Writing | WildBench | WildBench Score40.2 | 45 | |
| Code Generation | Coding Eval+ LiveCode (test) | Eval+ Score84.5 | 32 | |
| Code Generation | EvalPlus (test) | Eval+73.4 | 23 | |
| Mathematical Reasoning | MATH 500 | MATH-500 Score69.2 | 23 | |
| Creative Writing | WildBench (test) | WildBench Score60.4 | 15 | |
| Math | GSM8K MATH-500 (test) | GSM8K Accuracy89.5 | 15 | |
| Expert Pruning Efficiency | MoE Models | Calibration Time (h)0.22 | 6 |