Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

About

We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.

Arman Bolatov, Samuel Horv\'ath, Martin Tak\'a\v{c}, Eduard Gorbunov• 2026

Related benchmarks

Task	Dataset	Result	Rank
Character-level Language Modeling	Shakespeare (val)	Perplexity10.08		27
Optimization	Synthetic quartic function	Gradient Norm (\u00d7 10^-6)5.8		27

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord