Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals
About
In this paper, we present an efficient method for storing fine-tuned models by leveraging the low-rank properties of weight residuals. Our key observation is that weight residuals in large overparameterized models exhibit even stronger low-rank characteristics. Based on this insight, we propose Efficient Residual Encoding (ERE), a novel approach that achieves efficient storage of fine-tuned model weights by approximating the low-rank weight residuals. Furthermore, we analyze the robustness of weight residuals and push the limit of storage efficiency by utilizing additional quantization and layer-wise rank allocation. Our experimental results demonstrate that our method significantly reduces memory footprint while preserving performance in various tasks and modalities. We release our code.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@184.1 | 850 | |
| Visual Question Answering | GQA | Accuracy57 | 374 | |
| Mathematical Reasoning | AIME 2024 | Accuracy13.3 | 251 | |
| Code Generation | MBPP | Pass@186.2 | 175 | |
| Code Generation | MBPP | Accuracy (%)88.6 | 146 | |
| Mathematical Reasoning | MATH500 | Accuracy (ACC)57.2 | 133 | |
| Science Question Answering | ScienceQA (SQA) | Accuracy0.00e+0 | 128 | |
| Chat | AlpacaEval | Win Rate1.72e+3 | 25 | |
| Visual Question Answering | SQA | Accuracy71.4 | 23 | |
| Chat | IFEval | Loose Prompt Metric29.39 | 15 |