ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
About
With the proliferation of task-specific large language models, delta compression has emerged as a method to mitigate the resource challenges of deploying numerous such models by effectively compressing the delta model parameters. Previous delta-sparsification methods either remove parameters randomly or truncate singular vectors directly after singular value decomposition (SVD). However, these methods either disregard parameter importance entirely or evaluate it with too coarse a granularity. In this work, we introduce ImPart, a novel importance-aware delta sparsification approach. Leveraging SVD, it dynamically adjusts sparsity ratios of different singular vectors based on their importance, effectively retaining crucial task-specific knowledge even at high sparsity ratios. Experiments show that ImPart achieves state-of-the-art delta sparsification performance, demonstrating $2\times$ higher compression ratio than baselines at the same performance level. When integrated with existing methods, ImPart sets a new state-of-the-art on delta quantization and model merging.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@159.76 | 850 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy64.29 | 797 | |
| Code Generation | HumanEval (test) | Pass@158.54 | 444 | |
| Mathematical Reasoning | MATH (test) | Overall Accuracy13.54 | 433 | |
| Code Generation | MBPP (test) | Pass@168.5 | 276 | |
| Code Generation | MBPP | Pass@168 | 175 | |
| Instruction Following | IFEval (test) | IFEval Score35.3 | 45 | |
| Instruction Following | AlpacaEval (test) | Helpfulness Score2.83e+3 | 32 | |
| Chat | AlpacaEval | Win Rate1.88e+3 | 25 | |
| Chat | IFEval | Loose Prompt Metric33.27 | 15 |