Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

On Adaptivity in Zeroth-Order Optimization

About

We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no convergence advantage over well-tuned ZO-SGD, while incurring significant memory overhead. Our analysis reveals that in high dimensions, ZO gradients lack coordinate-wise heterogeneity, rendering adaptive mechanisms memory inefficient. Leveraging this insight, we propose MEAZO, a memory-efficient adaptive ZO optimizer that tracks only a single scalar for global step size adaptation. We support our method with theoretical convergence guarantees under standard assumptions. Experiments across multiple LLM families and tasks demonstrate that MEAZO matches ZO-Adam's performance with the memory footprint of ZO-SGD. Additional experiments on synthetic quadratic problems and LLM fine-tuning further demonstrate MEAZO's enhanced robustness to step size choices, particularly in grouped or block-structured optimization settings.

Hassan Dbouk, Nidham Gazagnadou, Matthias Reisser, Christos Louizos• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationOxford-IIIT Pets (test)
Mean Accuracy79.55
177
Question AnsweringSQuAD (test)
F185.75
156
Question AnsweringSQuAD
F1 Score82.94
63
SummarizationXsum
ROUGE-L27.31
42
Showing 4 of 4 rows

Other info

Follow for update