HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

About

Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared to dense LLMs. However, their large parameter counts result in prohibitive memory requirements, limiting practical deployment. While existing pruning methods primarily focus on expert-level pruning, this coarse granularity often leads to substantial accuracy degradation. In this work, we introduce HEAPr, a novel pruning algorithm that decomposes experts into smaller, indivisible atomic experts, enabling more precise and flexible atomic expert pruning. To measure the importance of each atomic expert, we leverage second-order information based on principles similar to the Optimal Brain Surgeon theory. To address the computational and storage challenges posed by second-order information, HEAPr exploits the inherent properties of atomic experts to transform the second-order information from expert parameters into that of atomic expert parameters, and further simplifies it to the second-order information of atomic expert outputs. This approach reduces the space complexity from $O(d^4)$, where $d$ is the model's dimensionality, to $O(d^2)$. HEAPr requires only two forward passes and one backward pass on a small calibration set to compute the importance of atomic experts. Extensive experiments on MoE models, including DeepSeek MoE and Qwen MoE family, demonstrate that HEAPr outperforms existing expert-level pruning methods across a wide range of pruning ratios and benchmarks. Specifically, HEAPr achieves nearly lossless compression at pruning ratios of 20% ~ 25% in most models, while also reducing FLOPs nearly by 20%. The code can be found at [https://github.com/LLIKKE/HEAPr](https://github.com/LLIKKE/HEAPr).

Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)5.92	2320
Commonsense Reasoning	HellaSwag	Accuracy63	1896
Commonsense Reasoning	WinoGrande	Accuracy74	1442
Question Answering	ARC Challenge	Accuracy49	906
Physical Commonsense Reasoning	PIQA	Accuracy81	696
Question Answering	ARC Easy	Accuracy76	597
Mathematical Reasoning	MathQA	Accuracy50	354
Language Modeling	PennTreeBank (PTB)	PPL9.34	151
Language Modeling	C4	C4 Loss9.85	121
Zero-shot Evaluation	ARC-Easy, ARC-Challenge, OpenBookQA, WinoGrande, PIQA, HellaSwag, MathQA, RTE, BoolQ zero-shot	Mean Accuracy60.62	59

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord