SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

About

Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank allocation strategy to assign appropriate truncation positions for different weight matrices. We conduct extensive experiments on LLaMA-2-7B/13B/70B and Mistral-7B models across a variety of benchmarks. SoLA exhibits remarkable improvement in both language modeling and downstream task accuracy without post-training. For example, with a 30\% compression rate on the LLaMA-2-70B model, SoLA surpasses the state-of-the-art method by reducing perplexity from 6.95 to 4.44 and enhancing downstream task accuracy by 10\%.

Xinhao Huang, You-Liang Huang, Zeyi Wen• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	--	1442
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy63.32	711
Language Modeling	WikiText2 (val)	Perplexity (PPL)4.06	423
Science Question Answering	ARC Challenge	Accuracy39.76	354
Reading Comprehension	BoolQ	Accuracy (BoolQ)66.09	228
Science Question Answering	ARC Easy	Accuracy69.99	162
Massive Multitask Language Understanding	MMLU	Accuracy44.2	129
Physical Commonsense Reasoning	PIQA	Accuracy73.67	78
Language Understanding and Question Answering	Downstream NLP Suite MMLU, BoolQ, PIQA, WinoGrande, HellaSwag, ARC-e, ARC-c, OBQA	Average Accuracy68.92	35
General Performance Averaging	Consolidated Task Suite Avg.	Average Accuracy58.1	23

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord