EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
About
The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths. Our code is available at https://github.com/IST-DASLab/EvoPress}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL5.48 | 1541 | |
| Language Modeling | C4 | Perplexity12.53 | 1182 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)7.64 | 841 | |
| Language Modeling | C4 (val) | PPL7.65 | 392 | |
| Language Modeling | WikiText2 v1 (test) | Perplexity5.74 | 341 | |
| Language Modeling | C4 | Perplexity33.72 | 321 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)5.42 | 277 | |
| Language Modeling | Wiki | Perplexity (PPL)28.76 | 251 | |
| Zero-shot Reasoning | Reasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test) | PIQA76.17 | 119 | |
| Zero-shot Common Sense Reasoning | Zero-shot Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (test) | PIQA77.69 | 95 |