3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

About

Sparse plus Low-Rank $(\mathbf{S} + \mathbf{LR})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices $(\mathbf{W} \approx \mathbf{S} + \mathbf{LR})$. Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce 3BASiL-TM, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{LR})$ decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed 3BASiL, to minimize the layer-wise reconstruction error with convergence guarantees. We then design an efficient transformer-matching (TM) refinement step that jointly optimizes the sparse and low-rank components across transformer layers. This step minimizes a novel memory-efficient loss that aligns outputs at the transformer level. Notably, the TM procedure is universal as it can enhance any $(\mathbf{S} + \mathbf{LR})$ decomposition, including pure sparsity. Our numerical experiments show that 3BASiL-TM reduces the WikiText2 perplexity gap relative to dense LLaMA-8B model by over 30% under a (2:4 Sparse + 64 LR) configuration, compared to prior methods. Moreover, our method achieves over 2.5x faster compression runtime on an A100 GPU compared to SOTA $(\mathbf{S} + \mathbf{LR})$ method. Our code is available at https://github.com/mazumder-lab/3BASiL.

Mehdi Makni, Xiang Meng, Rahul Mazumder• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)8.25	2320
Commonsense Reasoning	HellaSwag	Accuracy56.91	1896
Language Modeling	C4	Perplexity11.53	1688
Language Modeling	C4	Perplexity12.17	1565
Commonsense Reasoning	WinoGrande	Accuracy60.62	1442
Language Modeling	PTB	Perplexity16.52	1234
Question Answering	ARC Challenge	Accuracy32.94	906
Question Answering	ARC Easy	Accuracy56.86	597
Natural Language Inference	RTE	Accuracy59.57	590
Question Answering	PIQA	Accuracy72.74	505

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord