Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FlattenGPT: Depth Compression for Transformer with Layer Flattening

About

Recent works have indicated redundancy across transformer blocks, prompting the research of depth compression to prune less crucial blocks. However, current ways of entire-block pruning suffer from risks of discarding meaningful cues learned in those blocks, leading to substantial performance degradation. As another line of model compression, channel pruning can better preserve performance, while it cannot reduce model depth and is challenged by inconsistent pruning ratios for individual layers. To pursue better model compression and acceleration, this paper proposes \textbf{FlattenGPT}, a novel way to detect and reduce depth-wise redundancies. By flatting two adjacent blocks into one, it compresses the network depth, meanwhile enables more effective parameter redundancy detection and removal. FlattenGPT allows to preserve the knowledge learned in all blocks, and remains consistent with the original transformer architecture. Extensive experiments demonstrate that FlattenGPT enhances model efficiency with a decent trade-off to performance. It outperforms existing pruning methods in both zero-shot accuracies and WikiText-2 perplexity across various model types and parameter sizes. On LLaMA-2/3 and Qwen-1.5 models, FlattenGPT retains 90-96\% of zero-shot performance with a compression ratio of 20\%. It also outperforms other pruning methods in accelerating LLM inference, making it promising for enhancing the efficiency of transformers.

Ruihan Xu, Qingpei Guo, Yao Zhu, Xiangyang Ji, Ming Yang, Shiliang Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL6.68
1541
Image ClassificationImageNet-1K
Top-1 Acc81.6
836
Image ClassificationImageNet A
Top-1 Acc74.6
553
Language ModelingWikiText2 v1 (test)
Perplexity4.79
341
Image ClassificationImageNet-R
Accuracy93.7
148
Zero-shot ReasoningReasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)
PIQA76.33
119
Zero-shot Common Sense ReasoningZero-shot Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (test)
PIQA80.36
95
Zero-shot EvaluationTasks Zero-shot (mean)
mAcc73.94
25
Showing 8 of 8 rows

Other info

Follow for update