Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

About

As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence (BI) to gauge the significance of each layer in LLMs. We then propose a straightforward pruning approach: layer removal, in which we directly delete the redundant layers in LLMs based on their BI scores. Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation. The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture.

Xin Men, Mingyu Xu, Qingyu Zhang, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, Weipeng Chen• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity14.78
3785
Language ModelingWikiText-2 (test)
PPL13.59
2333
Language ModelingWikiText-2
Perplexity (PPL)6.79
2320
Commonsense ReasoningHellaSwag
Accuracy67.8
1896
Language ModelingC4
Perplexity13.5906
1565
Commonsense ReasoningWinoGrande
Accuracy70.8
1442
Image ClassificationImageNet-1K
Top-1 Acc79.7
1239
Language ModelingPTB
Perplexity54.5982
1234
Code GenerationHumanEval--
1043
Text-based Visual Question AnsweringTextVQA
Accuracy33.69
962
Showing 10 of 106 rows
...

Other info

Follow for update