Stochastic convergence of parallel asynchronous adaptive first-order methods
About
A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic factors) of order O(1/sqrt{t}) under reasonable assumptions. Numerical experiments suggest that such asynchronous adaptive algorithms are very relevant in heterogeneous large-scale machine learning systems.
Serge Gratton, Philippe L. Toint• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | FashionMNIST (test) | Accuracy87.34 | 363 | |
| Click-Through Rate Prediction | Criteo | AUC0.717 | 44 | |
| Image Classification | SVHN (test) | Accuracy69.65 | 26 | |
| Rating Prediction | MovieLens | RMSE0.916 | 18 | |
| Classification | Covtype (test) | Accuracy89.56 | 3 | |
| Image Classification | MoE-FMNIST (test) | Accuracy87.39 | 3 | |
| Optimization Convergence | FashionMNIST (train) | Final Training Loss Gradient Magnitude0.277 | 3 | |
| Optimization Convergence | MovieLens (train) | Final Training Loss Gradient0.021 | 3 | |
| Optimization Convergence | Criteo (train) | Final Training Loss Gradient Magnitude0.033 | 3 | |
| Optimization Convergence | MoE-FMNIST (train) | Final Training Loss Gradient0.744 | 3 |
Showing 10 of 12 rows