Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

About

Sparse plus Low-Rank $(\mathbf{S} + \mathbf{LR})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices $(\mathbf{W} \approx \mathbf{S} + \mathbf{LR})$. Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce 3BASiL-TM, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{LR})$ decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed 3BASiL, to minimize the layer-wise reconstruction error with convergence guarantees. We then design an efficient transformer-matching (TM) refinement step that jointly optimizes the sparse and low-rank components across transformer layers. This step minimizes a novel memory-efficient loss that aligns outputs at the transformer level. Notably, the TM procedure is universal as it can enhance any $(\mathbf{S} + \mathbf{LR})$ decomposition, including pure sparsity. Our numerical experiments show that 3BASiL-TM reduces the WikiText2 perplexity gap relative to dense LLaMA-8B model by over 30% under a (2:4 Sparse + 64 LR) configuration, compared to prior methods. Moreover, our method achieves over 2.5x faster compression runtime on an A100 GPU compared to SOTA $(\mathbf{S} + \mathbf{LR})$ method. Our code is available at https://github.com/mazumder-lab/3BASiL.

Mehdi Makni, Xiang Meng, Rahul Mazumder• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy56.91
1891
Language ModelingWikiText-2
Perplexity (PPL)8.25
1624
Language ModelingC4
Perplexity12.17
1422
Commonsense ReasoningWinoGrande
Accuracy60.62
1085
Language ModelingC4
Perplexity11.53
1071
Language ModelingPTB
Perplexity16.52
1034
Question AnsweringARC Challenge
Accuracy32.94
906
Question AnsweringARC Easy
Accuracy56.86
597
Natural Language InferenceRTE
Accuracy59.57
448
Question AnsweringPIQA
Accuracy72.74
374
Showing 10 of 19 rows

Other info

Follow for update