Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

About

Recent studies have demonstrated that many layers are functionally redundant in large language models (LLMs), enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model's performance under 20% compression ratio.

Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL9.59
1541
Commonsense ReasoningHellaSwag
Accuracy62.7
1460
Physical Commonsense ReasoningPIQA
Accuracy73.3
329
Boolean Question AnsweringBoolQ
Accuracy68.4
307
Reading ComprehensionRACE high
Accuracy36.1
295
Reading ComprehensionRACE mid
Accuracy35.1
196
Long-context Language UnderstandingLongBench (test)
Average Score26.26
133
Coreference ResolutionWSC
Accuracy41.4
96
Multi-task Language UnderstandingMMLU
Accuracy43.1
87
Chinese Multitask Language UnderstandingCMMLU
Accuracy30.7
50
Showing 10 of 18 rows

Other info

Follow for update