Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

About

Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.

Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang• 2023

Related benchmarks

TaskDatasetResultRank
ClassificationZero-shot Evaluation Suite (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA)
Average Accuracy (Zero-Shot Suite)59.51
94
Commonsense ReasoningCommonsense Reasoning Suite BoolQ, PIQA, HellaS, WinoG, ARC-e, ARC-c, OBQA
Average Accuracy59.51
43
Zero-shot ClassificationClassification Datasets (MMLU, OBQA, ARC-e, WinoGrande, ARC-c, PIQA, HellaSwag)
MMLU (5-shot)31.9
18
Showing 3 of 3 rows

Other info

Follow for update