Gradient descent with generalized Newton's method

About

We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, our method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers.

Zhiqi Bu, Shiyun Xu• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy77.7	3381
Image Classification	Food-101 (test)	Accuracy44.9	145
Image Classification	ImageNet-100 (test)	Clean Accuracy55	123
Image Classification	CIFAR-10	Latency (ms/iter)22.29	13
Image Classification	MNIST (test)	Accuracy99.21	12
Instance Segmentation	MS-COCO 2017 (test)	Box mAP5029.7	6
Keypoint Detection	MS-COCO 2017 (test)	mAP50 (Box)62.9	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord