Fast and Effective Weight Update for Pruned Large Language Models
About
Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs. Code is available at https://github.com/fmfi-compbio/admm-pruning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL7.78 | 1541 | |
| Commonsense Reasoning | HellaSwag | Accuracy53.35 | 1460 | |
| Question Answering | ARC Challenge | Accuracy39.68 | 749 | |
| Question Answering | ARC Easy | Accuracy72.77 | 386 | |
| Natural Language Inference | RTE | Accuracy61.37 | 367 | |
| Language Modeling | C4 | Perplexity8.11 | 321 | |
| Language Modeling | Wiki | Perplexity (PPL)5.92 | 251 | |
| Question Answering | BoolQ | Accuracy76.24 | 240 | |
| Question Answering | OpenBookQA | Accuracy31.4 | 84 | |
| Zero-shot Accuracy | ARC Easy | Zero-shot Acc (ARC Easy)68.18 | 63 |