Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimDiff: Depth Pruning via Similarity and Difference

About

Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this issue, we propose SimDiff, a novel layer importance criterion that jointly evaluates layers from two orthogonal perspectives: representational similarity and transformation difference. The difference is quantified using two distinct metrics: MSSD, which is sensitive to outliers and identifies layers that make decisive corrections, and MASD, which robustly measures a layer's average contribution. Extensive experiments on multiple models ranging from 0.5B to 13B parameters demonstrate that SimDiff significantly outperforms state-of-the-art baselines across various pruning ratios. Notably, our method retains over 91% of LLaMA2-7B's performance at a 25% pruning ratio and achieves up to a 1.49x inference speedup when pruning 12 layers on LLaMA3.1-8B. We also show that pruned models can be effectively recovered with minimal fine-tuning.

Yuli Chen, Shuhao Zhang, Fanshen Meng, Bo Cheng, Jiale Han, Qiang Tong, Xiulei Liu• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity16.06
3785
Zero-shot language evaluationZero-shot NLP Evaluation Suite (WikiText2, BoolQ, PIQA, HellaSwag, WinoGrande, ARC, OBQA, MTQA) (test)
WikiText2 Perplexity7.43
27
Zero-shot Language ReasoningBoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA, MTQA zero-shot
BoolQ Accuracy71.8
21
Zero-shot Natural Language UnderstandingNLU Benchmark Suite CMNLI, HeSW, PIQA, WSC, CoQA, BoolQ, Race-M, Race-H, XSum, C3
CMNLI Accuracy34.4
8
Commonsense ReasoningEvaluation Suite Zero-shot (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA, MTQA) v1 (test)
BoolQ Accuracy (Zero-shot)74.83
6
Showing 5 of 5 rows

Other info

Follow for update