Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

About

Euclidean gradient descent algorithms barely capture the geometry of objective function-induced hypersurfaces and risk driving update trajectories off the hypersurfaces. Riemannian gradient descent algorithms address these issues but fail to represent complex hypersurfaces via a single classic manifold. We propose geodesic gradient descent (GGD), a generic and learning-rate-free Riemannian gradient descent algorithm. At each iteration, GGD uses an n-dimensional sphere to approximate a local neighborhood on the objective function-induced hypersurface, adapting to arbitrarily complex geometries. A tangent vector derived from the Euclidean gradient is projected onto the sphere to form a geodesic, ensuring the update trajectory stays on the hypersurface. Parameter updates are performed using the endpoint of the geodesic. The maximum step size of the gradient in GGD is equal to a quarter of the arc length on the n-dimensional sphere, thus eliminating the need for a learning rate. Experimental results show that compared with the classic Adam algorithm, GGD achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers' dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.

Liwei Hu, Guangyao Li, Wenyong Wang, Xiaoming Zhang, Yu Xiang• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST (train)	Training Loss0.0034	41
Image Classification	MNIST (test)	Test Cross-Entropy0.0278	18
Regression	Burgers' dataset (train)	MSE5.12e-7	18
Regression	Burgers' dataset (test)	MSE1.80e-4	18

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord