GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

About

Recent studies have demonstrated that many layers are functionally redundant in large language models (LLMs), enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model's performance under 20% compression ratio.

Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL9.59	2333
Commonsense Reasoning	HellaSwag	Accuracy62.7	1896
Physical Commonsense Reasoning	PIQA	Accuracy73.3	696
Multi-task Language Understanding	MMLU	Accuracy43.1	353
Boolean Question Answering	BoolQ	Accuracy68.4	350
Reading Comprehension	RACE high	Accuracy36.1	295
Reading Comprehension	RACE mid	Accuracy35.1	196
Long-context Language Understanding	LongBench (test)	Average Score26.26	147
Coreference Resolution	WSC	Accuracy41.4	116
Chinese Multitask Language Understanding	CMMLU	Accuracy30.7	67

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord