On Adaptivity in Zeroth-Order Optimization

About

We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no convergence advantage over well-tuned ZO-SGD, while incurring significant memory overhead. Our analysis reveals that in high dimensions, ZO gradients lack coordinate-wise heterogeneity, rendering adaptive mechanisms memory inefficient. Leveraging this insight, we propose MEAZO, a memory-efficient adaptive ZO optimizer that tracks only a single scalar for global step size adaptation. We support our method with theoretical convergence guarantees under standard assumptions. Experiments across multiple LLM families and tasks demonstrate that MEAZO matches ZO-Adam's performance with the memory footprint of ZO-SGD. Additional experiments on synthetic quadratic problems and LLM fine-tuning further demonstrate MEAZO's enhanced robustness to step size choices, particularly in grouped or block-structured optimization settings.

Hassan Dbouk, Nidham Gazagnadou, Matthias Reisser, Christos Louizos• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	Oxford-IIIT Pets (test)	Mean Accuracy79.55	177
Question Answering	SQuAD (test)	F185.75	156
Question Answering	SQuAD	F1 Score82.94	63
Summarization	Xsum	ROUGE-L27.31	42

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord