Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

About

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. To address this challenge, we propose NoWag (Normalized Weight and Activation Guided Compression), a unified framework for one-shot shape preserving compression algorithms. We apply NoWag to compress Llama-2 (7B, 13B, 70B) and Llama-3 (8B, 70B) models using two popular shape-preserving techniques: vector quantization (NoWag-VQ) and unstructured/semi-structured pruning (NoWag-P). Our results show that NoWag-VQ significantly outperforms state-of-the-art one-shot vector quantization methods, while NoWag-P performs competitively against leading pruning techniques. These findings highlight underlying commonalities between these compression paradigms and suggest promising directions for future research. Our code is available at https://github.com/LawrenceRLiu/NoWag

Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity15.02
3785
Commonsense ReasoningWinoGrande--
1442
Commonsense ReasoningHellaSwag
HellaSwag Accuracy59.31
711
Question AnsweringARC Challenge
Accuracy (ARC)35.53
598
Question AnsweringARC Easy--
597
Multitask Language UnderstandingMMLU
Accuracy78.93
520
Question AnsweringPIQA
Accuracy74.42
505
Sentence CompletionHellaSwag
Accuracy70.08
364
Mathematical ReasoningMathQA
Accuracy23.73
354
Language ModelingWikiText2
Perplexity5.17
277
Showing 10 of 28 rows

Other info

Follow for update