Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

About

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.

Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity14.2428
2839
Language ModelingWikiText-2 (test)
PPL5.85
1949
Language ModelingWikiText-2
Perplexity (PPL)7.08
1624
Language ModelingC4
Perplexity12.9682
1422
Language ModelingPTB
Perplexity52.9183
1034
Language ModelingWikiText2 v1 (test)
Perplexity4.88
383
Subjectivity ClassificationSubj
Accuracy52.1
329
Question ClassificationTREC
Accuracy14
259
Zero-shot ReasoningReasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)
PIQA78.18
177
Sentiment AnalysisMR
Accuracy0.511
160
Showing 10 of 31 rows

Other info

Follow for update