Learning-Rate-Free Learning by D-Adaptation

About

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available.

Aaron Defazio, Konstantin Mishchenko• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy89.6	3381
Image Classification	Food-101 (test)	Accuracy72.5	145
Image Classification	ImageNet-100 (test)	Clean Accuracy76.9	123
Language Modeling	C4 LLaMA-130M (val)	Perplexity18.672	40
Image Classification	CIFAR-10	Latency (ms/iter)24.16	13
Image Classification	MNIST (test)	Accuracy99.58	12
Binary Classification	CIFAR-10 (val)	Peak Validation Range90.15	7
Binary Classification	ImageNet (val)	Peak Validation Range87.96	7
Image Classification	ImageNet	Runtime (s)2.94	7
Image Classification	CIFAR-10	Runtime (s)3.68	7

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord