Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

About

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths. Our code is available at https://github.com/IST-DASLab/EvoPress}.

Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL5.48
2333
Language ModelingWikiText-2
Perplexity (PPL)7.64
2320
Language ModelingC4
Perplexity17.58
1688
Language ModelingC4
Perplexity12.53
1565
Language ModelingC4 (val)
PPL7.65
737
Language ModelingWikiText2 (val)
Perplexity (PPL)5.42
423
Language ModelingWikiText2 v1 (test)
Perplexity5.74
383
Language ModelingWiki
Perplexity (PPL)28.76
298
Zero-shot ReasoningReasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)
Average Accuracy64.67
297
Language ModelingWikiText2
Perplexity86.94
277
Showing 10 of 14 rows

Other info

Follow for update