A Simple Linear Patch Revives Layer-Pruned Large Language Models

About

Layer pruning has emerged as a widely used technique for compressing large language models (LLMs). However, existing layer pruning approaches often incur substantial performance degradation. We identify the majority of this degradation to a single yet previously overlooked issue: \textit{the mismatch of activation magnitudes at the pruning interface}. The pre-interface activations exhibit significantly different scales from the post-interface ones, causing the distributional shift as it propagates through the remaining layers. To address this issue, we introduce \textsc{LinearPatch}, a lightweight and plug-and-play technique that fuses two operations into one matrix multiply at the pruning interface: (i) a Hadamard transformation that suppresses massive outliers at particular tokens and (ii) a channel-wise scaling that aligns activation statistics. On LLaMA-3-8B, \textsc{LinearPatch} preserves up to \textbf{94.15\%} of the original model's performance when pruning 5 out of 32 layers, outperforming the previous state of the art by \textbf{4\%}. The patch can be further refined with 5K unlabeled samples via memory-efficient offline distillation, pushing the retention to 95.16\% within only 30 minutes on a single GPU. Code is available at https://github.com/chenxinrui-tsinghua/LinearPatch.

Xinrui Chen, Haoli Bai, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	ARC Challenge	Accuracy43.17	906
Question Answering	ARC Easy	Accuracy64.35	597
Question Answering	PIQA	Accuracy73.23	505
Multiple-choice Question Answering	HellaSwag	Accuracy69.33	196
Question Answering	WinoGrande (WG)	Accuracy70.17	138
Language Modeling	WikiText-2, PTB, C4	Perplexity20.09	135
Commonsense QA	Commonsense QA (ARC-E, ARC-C, HellaS, WinoG, BoolQ, OBQA, RTE, CoPa, Race) zero-shot	ARC-Easy Accuracy57.62	57
Question Answering	WinoGrande, HellaSwag, ARC-e, ARC-c, PIQA Average	Avg Accuracy62.75	35
Commonsense Reasoning	Commonsense Reasoning	ARC-E Accuracy67.8	20

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord