EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

About

Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains constrained by memory footprint and throughput because the full expert pool must still be stored and served. Post-training expert pruning reduces this cost, but most methods focus on which experts to prune within each layer and default to a uniform layer-wise sparsity allocation, even though the layer-wise allocation can strongly affect performance. We decouple pruning into within-layer expert ranking and across-layer budget allocation, and introduce \textbf{E}xpected \textbf{S}peculative \textbf{A}cceptance \textbf{P}roxy (\textbf{ESAP}), a speculative-decoding-inspired, teacher-forced metric that measures how well a pruned model matches the full model without costly autoregressive decoding. ESAP is bounded and stable, enabling cheap comparison of many candidates. Building on ESAP, we propose EvoESAP, an evolutionary search framework that finds an improved non-uniform layer-wise sparsity allocation under a fixed global budget while holding the within-layer pruning order fixed, making it a plug-and-play method for criteria such as Frequency, EAN, SEER, and REAP. Across 7B--30B SMoE LLMs at 25\% and 50\% sparsity, EvoESAP consistently discovers non-uniform allocations that improve open-ended generation (up to \textbf{+19.6\%} on MATH-500 at 50\% sparsity) while preserving competitive multiple-choice accuracy compared with uniform pruning at the same sparsity. Code is available at https://github.com/ZongfangLiu/EvoESAP.

Zongfang Liu, Shengkun Tang, Boyang Sun, Zhiqiang Shen, Xin Yuan• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy67	711
Multiple-choice Question Answering	MMLU	Accuracy66.2	210
Coreference Resolution	WinoGrande	Accuracy70.2	61
Multiple-Choice QA	Multiple-Choice Suite	MC Avg67.5	49
Multiple-choice Question Answering	MC (test)	MC Avg70.6	46
Mathematical Reasoning	GSM8K MATH-500 (test)	Accuracy (GSM8K)91	43
Code Generation	Coding Eval+ LiveCode (test)	Eval+ Score87.2	32
Coding	Coding Suite EvalPlus & LiveCodeBench	Eval+ Score83.8	26
Open-ended generation	WildBench	WildBench0.377	26
Boolean Question Answering	BoolQ	Acc (Normalized)86.6	20

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord