ActTail: Global Activation Sparsity in Large Language Models

About

Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in Heavy-Tailed Self-Regularization (HT-SR) theory. Specifically, we capture this heterogeneity via the heavy-tail exponent computed from each projection's empirical spectral density (ESD), which is used as a quantitative indicator to assign projection-specific sparsity budgets. Importantly, we provide a theoretical analysis that establishes an explicit relationship between the activation sparsity ratio and the heavy-tail exponent under the HT-SR regime, offering principled guidance for sparsity allocation beyond heuristic design. Experiments on LLaMA and Mistral models show that our method improves both perplexity and downstream task performance at high sparsity compared to uniform allocation. At 80% sparsity, perplexity is reduced by 21.8% on LLaMA-2-7B, 40.1% on LLaMA-2-13B, and 9.4% on Mistral-7B.

Wenwen Hou, Xinyuan Song, Shiwei Liu• 2026

Related benchmarks

Task	Dataset	Result	Rank
Language Modeling	WikiText2	Perplexity5.14		2839
Natural Language Understanding and Reasoning	MMLU, ARC-c, HellaSwag, BOOLQ, PIQA, WinoGrande zero-shot	Average Score (Zero-shot)62.18		20

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord