Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

About

Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead due to the need to store activation values for back-propagation during gradient computation, particularly in long-context fine-tuning tasks. Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values, thus eliminating the need for activation storage. Nevertheless, existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance. This paper proposes a low-rank ZO gradient estimator and introduces a novel low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs. We provide convergence guarantees for LOZO by framing it as a subspace optimization method. Additionally, its low-rank nature enables LOZO to integrate with momentum techniques while incurring negligible extra memory costs. Extensive experiments across various model sizes and downstream tasks demonstrate that LOZO and its momentum-based variant outperform existing ZO methods and closely approach the performance of FO algorithms.

Yiming Chen, Yuan Zhang, Liyuan Cao, Kun Yuan, Zaiwen Wen• 2024

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceRTE
Accuracy78.7
590
Image ClassificationCIFAR-100
Accuracy61.8
302
Question ClassificationTREC
Accuracy89.8
262
Common Sense ReasoningCOPA
Accuracy91
256
Natural Language InferenceSNLI
Accuracy82.5
196
Sentiment AnalysisSST-5
Accuracy50.4
123
Text ClassificationBoolQ
Accuracy68.1
118
Natural Language UnderstandingSuperGLUE--
84
ClassificationCB
Accuracy69.6
70
Natural Language UnderstandingGLUE and SuperGLUE (test val)
SST-286.6
37
Showing 10 of 37 rows

Other info

Follow for update