Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

About

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths. Our code is available at https://github.com/IST-DASLab/EvoPress}.

Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL5.48
1949
Language ModelingWikiText-2
Perplexity (PPL)7.64
1624
Language ModelingC4
Perplexity12.53
1422
Language ModelingC4
Perplexity33.72
1071
Language ModelingC4 (val)
PPL7.65
514
Language ModelingWikiText2 (val)
Perplexity (PPL)5.42
387
Language ModelingWikiText2 v1 (test)
Perplexity5.74
383
Language ModelingWiki
Perplexity (PPL)28.76
281
Zero-shot ReasoningReasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)
PIQA76.17
177
Zero-shot Common Sense ReasoningZero-shot Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (test)
PIQA77.69
95
Showing 10 of 12 rows

Other info

Follow for update