Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
About
Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e.g., WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@185 | 850 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy62.4 | 797 | |
| Code Generation | HumanEval (test) | Pass@156.71 | 444 | |
| Mathematical Reasoning | MATH (test) | Overall Accuracy12.56 | 433 | |
| Visual Question Answering | GQA | Accuracy62.8 | 374 | |
| Code Generation | MBPP (test) | Pass@168.3 | 276 | |
| Mathematical Reasoning | AIME 2024 | Accuracy30 | 251 | |
| Code Generation | MBPP | Pass@182.7 | 175 | |
| Code Generation | MBPP | Accuracy (%)86.5 | 146 | |
| Mathematical Reasoning | MATH500 | Accuracy (ACC)76.5 | 133 |