IMPACT: Importance-Aware Activation Space Reconstruction

About

Large language models (LLMs) achieve strong performance across diverse domains but remain difficult to deploy in resource-constrained environments due to their size. Low-rank compression is a common remedy, typically minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. In contrast, LLM activations exhibit a more pronounced low-rank structure, motivating approaches that minimize activation reconstruction error. This shift alone, however, is not sufficient: different activation dimensions contribute unequally to model performance, and treating them uniformly can lead to accuracy loss. We introduce IMPACT, an importance-aware activation reconstruction framework that links compression to its effect on model performance. IMPACT formulates compression as an optimization problem that integrates activation structure with gradient-based importance, deriving a closed-form solution where reconstruction bases arise from an importance-weighted activation covariance matrix. This yields low-rank compression explicitly optimized for accuracy preservation. Experiments across multiple models and tasks demonstrate that IMPACT achieves up to 55.4% greater model size reduction while maintaining accuracy comparable to or better than state-of-the-art baselines.

Md Mokarram Chowdhury, Daniel Agyei Asante, Ernie Chang, Yang Li• 2025

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval (test)	--	701
Code Generation	MBPP (test)	--	411
Code Generation	HumanEval	Accuracy45.1	224
Mathematical Reasoning	GSM8K	GSM8K Accuracy (%)66.4	220
Mathematical Reasoning	MATH	--	55
Code Generation	MBPP	MBPP Accuracy59.8	30
Mathematical Reasoning	GSM8K	GSM8K Accuracy72.7	30
Mathematical Reasoning	Mathematical Reasoning Task	Throughput (Token/s)616	24

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord