Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

About

Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead due to the need to store activation values for back-propagation during gradient computation, particularly in long-context fine-tuning tasks. Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values, thus eliminating the need for activation storage. Nevertheless, existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance. This paper proposes a low-rank ZO gradient estimator and introduces a novel low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs. We provide convergence guarantees for LOZO by framing it as a subspace optimization method. Additionally, its low-rank nature enables LOZO to integrate with momentum techniques while incurring negligible extra memory costs. Extensive experiments across various model sizes and downstream tasks demonstrate that LOZO and its momentum-based variant outperform existing ZO methods and closely approach the performance of FO algorithms.

Yiming Chen, Yuan Zhang, Liyuan Cao, Kun Yuan, Zaiwen Wen• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100
Accuracy61.8
302
Natural Language UnderstandingSuperGLUE--
84
Natural Language UnderstandingGLUE and SuperGLUE (test val)
SST-286.6
37
Natural Language UnderstandingSuperGLUE
SST-2 Accuracy92.5
18
Natural Language UnderstandingGLUE & SuperGLUE (test)
RTE Accuracy69.7
17
Question AnsweringSQuAD v1.1 v2.0 (test dev)
F1 Score77.3
8
Sentiment AnalysisSST-2 GLUE (test dev)
Accuracy92.2
8
Natural Language InferenceRTE GLUE (test dev)
Accuracy56.3
8
Natural Language InferenceCB SuperGLUE (test dev)
Accuracy57.1
8
Question AnsweringBoolQ SuperGLUE (test dev)
Accuracy65
8
Showing 10 of 14 rows

Other info

Follow for update