Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
About
Supervised Fine-Tuning (SFT) empowers Large Language Models (LLMs) with exceptional performance on specialized tasks, but it yields dense, high-dimensional delta parameters that pose severe storage and distribution challenges. Singular Value Decomposition (SVD)-based compression offers a compact representation for such delta parameters, but existing methods adopt heuristic quantization without clarifying underlying mechanisms, leading to poor generalizability. In this work, we propose PrinMix, a rigorous SVD-based framework that models quantization as an optimization problem, grounding the design in mathematical mechanisms. We first theoretically derive quantization error and identify a key singular-value-dominated scaling mechanism, which mathematically proves the necessity of mix-precision quantization. We then model the quantization scheme as a 0/1 Integer Linear Programming (ILP) problem, which yields optimal bit-budget-constrained solutions without empirical assumptions. Furthermore, PrinMix integrates a Reconstruction Target Correction (RTC) method to compensate for errors from the $\mathbf{V}$-then-$\mathbf{U}$ sequential quantization process. Extensive experiments confirm PrinMix performs well: for 7B LLMs, PrinMix outperforms SOTA Delta-CoMe on challenging benchmarks by 22.3% on AIME2024 and 6.1% on GQA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@185.6 | 850 | |
| Visual Question Answering | GQA | Accuracy62.7 | 374 | |
| Mathematical Reasoning | AIME 2024 | Accuracy36.7 | 251 | |
| Code Generation | MBPP | Pass@183.1 | 175 | |
| Mathematical Reasoning | GSM8K | Math Score56.1 | 171 | |
| Code Generation | MBPP | Accuracy (%)86.9 | 146 | |
| Mathematical Reasoning | MATH500 | Accuracy (ACC)80.2 | 133 | |
| Science Question Answering | ScienceQA (SQA) | Accuracy79.4 | 128 | |
| Code Generation | MBPP | MBPP Score50.2 | 35 | |
| Visual Question Answering | SQA | Accuracy72.1 | 23 |