Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Streamlining Redundant Layers to Compress Large Language Models

About

This paper introduces LLM-Streamline, a pioneer work on layer pruning for large language models (LLMs). It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less important layers to be pruned.LLM-Streamline comprises two parts: layer pruning, which removes consecutive layers with the lowest importance based on target sparsity, and layer replacement, a novel module that trains a lightweight network to replace the pruned layers to mitigate performance loss. Additionally, a new metric called stability is proposed to address the limitations of the widely used accuracy metric in evaluating model compression. Experiments show that LLM-Streamline outperforms both previous and concurrent state-of-the-art pruning methods in terms of both performance and training efficiency.Our code is available at https://github.com/RUCKBReasoning/LLM-Streamline

Xiaodong Chen, Yuxuan Hu, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity9.76
3785
Commonsense ReasoningHellaSwag
Accuracy61.2
1896
Commonsense ReasoningHellaSwag
HellaSwag Accuracy45.9
711
Physical Commonsense ReasoningPIQA
Accuracy72
696
Multitask Language UnderstandingMMLU
Accuracy45.5
520
Physical Interaction Question AnsweringPIQA
Accuracy71.5
415
Diagram Question AnsweringAI2D
AI2D Accuracy65.4
387
Mathematical ReasoningMathVista
Accuracy56.7
382
Chart Question AnsweringChartQA--
371
Multi-discipline Multimodal UnderstandingMMMU--
363
Showing 10 of 64 rows

Other info

Follow for update