Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals

About

In this paper, we present an efficient method for storing fine-tuned models by leveraging the low-rank properties of weight residuals. Our key observation is that weight residuals in large overparameterized models exhibit even stronger low-rank characteristics. Based on this insight, we propose Efficient Residual Encoding (ERE), a novel approach that achieves efficient storage of fine-tuned model weights by approximating the low-rank weight residuals. Furthermore, we analyze the robustness of weight residuals and push the limit of storage efficiency by utilizing additional quantization and layer-wise rank allocation. Our experimental results demonstrate that our method significantly reduces memory footprint while preserving performance in various tasks and modalities. We release our code.

Simo Ryu, Seunghyun Seo, Jaejun Yoo• 2023

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Pass@184.1
1036
Visual Question AnsweringGQA
Accuracy57
505
Mathematical ReasoningAIME 2024
Accuracy13.3
370
Science Question AnsweringScienceQA (SQA)
Accuracy0.00e+0
273
Code GenerationMBPP
Pass@186.2
193
Code GenerationMBPP
Accuracy (%)88.6
146
Mathematical ReasoningMATH500
Accuracy (ACC)57.2
133
Visual Question AnsweringSQA
Accuracy71.4
41
ChatAlpacaEval
Win Rate1.72e+3
39
ChatIFEval
Loose Prompt Metric29.39
15
Showing 10 of 13 rows

Other info

Follow for update