Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

About

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. To address this challenge, we propose NoWag (Normalized Weight and Activation Guided Compression), a unified framework for one-shot shape preserving compression algorithms. We apply NoWag to compress Llama-2 (7B, 13B, 70B) and Llama-3 (8B, 70B) models using two popular shape-preserving techniques: vector quantization (NoWag-VQ) and unstructured/semi-structured pruning (NoWag-P). Our results show that NoWag-VQ significantly outperforms state-of-the-art one-shot vector quantization methods, while NoWag-P performs competitively against leading pruning techniques. These findings highlight underlying commonalities between these compression paradigms and suggest promising directions for future research. Our code is available at https://github.com/LawrenceRLiu/NoWag

Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity15.02
2839
Commonsense ReasoningWinoGrande--
1085
Question AnsweringARC Easy--
597
Multitask Language UnderstandingMMLU
Accuracy78.93
413
Question AnsweringPIQA
Accuracy74.42
374
Commonsense ReasoningHellaSwag
HellaSwag Accuracy59.31
350
Mathematical ReasoningMathQA
Accuracy23.73
305
Sentence CompletionHellaSwag
Accuracy70.08
276
Logical reasoningBBH
Accuracy56.11
201
Multiple-choice Question AnsweringARC Easy
Accuracy72.73
188
Showing 10 of 28 rows

Other info

Follow for update