Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

About

Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we propose HiZOO, a diagonal Hessian informed zeroth-order optimizer which is the first work to leverage the diagonal Hessian to enhance zeroth-order optimizer for fine-tuning LLMs. What's more, HiZOO avoids the expensive memory cost and only increases one forward pass per step. Extensive experiments on various models (350M~66B parameters) indicate that HiZOO improves model convergence, significantly reducing training steps and effectively enhancing model accuracy. Moreover, we visualize the optimization trajectories of HiZOO on test functions, illustrating its effectiveness in handling heterogeneous curvatures. Lastly, we provide theoretical proofs of convergence for HiZOO. Code is publicly available at https://anonymous.4open.science/r/HiZOO27F8.

Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W.Tsang• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100	Accuracy64.9	302
Common Sense Reasoning	COPA	Accuracy93	256
Question Answering	SQuAD (test)	--	156
Sentiment Analysis	SST-2 (test)	Accuracy85.02	144
Text Classification	BoolQ	Accuracy73.9	118
Text Classification	RTE	Accuracy71.8	104
Classification	SST2	Accuracy92.1	102
Classification	CB	Accuracy69.6	70
Classification	WSC	Accuracy63.5	59
Commonsense Reasoning	COPA (test)	Accuracy80.67	54

Showing 10 of 41 rows

Other info

Follow for update

@wizwand_team Discord