Stochastic convergence of parallel asynchronous adaptive first-order methods

About

A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic factors) of order O(1/sqrt{t}) under reasonable assumptions. Numerical experiments suggest that such asynchronous adaptive algorithms are very relevant in heterogeneous large-scale machine learning systems.

Serge Gratton, Philippe L. Toint• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	FashionMNIST (test)	Accuracy87.34	461
Image Classification	SVHN (test)	Accuracy69.65	98
Click-Through Rate Prediction	Criteo	AUC0.717	44
Rating Prediction	MovieLens	RMSE0.916	18
Classification	Covtype (test)	Accuracy89.56	3
Image Classification	MoE-FMNIST (test)	Accuracy87.39	3
Optimization Convergence	FashionMNIST (train)	Final Training Loss Gradient Magnitude0.277	3
Optimization Convergence	MovieLens (train)	Final Training Loss Gradient0.021	3
Optimization Convergence	Criteo (train)	Final Training Loss Gradient Magnitude0.033	3
Optimization Convergence	MoE-FMNIST (train)	Final Training Loss Gradient0.744	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord