QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

About

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which are error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method can avoid the error-prone low-precision straight-through estimator, and utilizes optimized stochastic rounding to mitigate the increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in ${\rm FP}8$ and superior accuracy in ${\rm INT}8$ and ${\rm INT}4$ training. Experiments demonstrate that low-bit training QuZO achieves performance comparable to MeZO optimization on GLUE, Multi-Choice, and Generation tasks, while reducing memory cost by $2.94 \times$ in LLaMA2-7B fine-tuning compared to quantized first-order methods.

Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Text Classification	BoolQ	Accuracy64.9	124
Text Classification	RTE	Accuracy56.5	110
Classification	SST2	Accuracy72.1	108
Classification	CB	Accuracy41.7	76
Generation	SQuAD	F1 Score30.5	58
Multiple-Choice	COPA	Accuracy77.9	42

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord