Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

About

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e.g., WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability.

Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun• 2024

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	Pass@185	1043
Mathematical Reasoning	GSM8K (test)	Accuracy62.4	954
Code Generation	HumanEval (test)	Pass@156.71	612
Visual Question Answering	GQA	Accuracy62.8	524
Mathematical Reasoning	MATH (test)	Overall Accuracy12.56	433
Code Generation	MBPP (test)	Pass@168.3	405
Mathematical Reasoning	AIME 2024	Accuracy30	370
Science Question Answering	ScienceQA (SQA)	Accuracy76.5	273
Code Generation	MBPP	Pass@182.7	193
Code Generation	MBPP	Accuracy (%)86.5	146

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord