Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Automatic Pruning Discovery for Large Language Models

About

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks, hindering real-world deployment due to their massive size. Existing pruning methods (e.g., Wanda) tailored for LLMs rely heavily on manual design pruning algorithms, thereby leading to huge labor costs and requires expert knowledge. Furthermore, we are the first to identify the serious outlier value issue behind dramatic performance degradation under high pruning ratios that are caused by uniform sparsity, raising an additional concern about how to design adaptive pruning sparsity ideal for LLMs. Can LLMs prune by themselves? In this work, we introduce an affirmative answer by proposing a novel pruning method called AutoPrune, which first overcomes expert knowledge limits by leveraging LLMs to design optimal pruning algorithm for themselves automatically without any expert knowledge. Specifically, to mitigate the black-box nature of LLMs, we propose a Graph-driven Chain-of-Thought (GCoT) to optimize prompts, significantly enhancing the reasoning process in learning the pruning algorithm and enabling us to generate pruning algorithms with superior performance and interpretability in the next generation. Finally, grounded in insights of outlier value issue, we introduce Skew-aware Dynamic Sparsity Allocation (SDSA) to overcome the outlier value issue, mitigating performance degradation under high pruning ratios. We conduct extensive experiments on mainstream LLMs benchmarks, demonstrating the superiority of AutoPrune, which consistently excels state-of-the-art competitors.

Haidong Kang, Lihong Lin, Enneng Yang, Hongning Dai, Hao Wang• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy78.89
1442
Natural Language InferenceRTE
Accuracy72.56
590
Question AnsweringOBQA
Accuracy37.6
347
Science Question AnsweringARC-C
Accuracy50.97
261
Science Question AnsweringARC-E
Accuracy80.06
240
Language ModelingWikiText
Word Perplexity4
234
Question AnsweringBoolQ
Accuracy84.79
201
Question AnsweringOBQA
Accuracy (Normalized)38.8
29
Question AnsweringARC-E
Accuracy (%)81.2
15
Language ModelingWikiText (val)
Perplexity (Dense)8.59
4
Showing 10 of 10 rows

Other info

Follow for update