Flatness-Aware Stochastic Gradient Langevin Dynamics

About

Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD targets a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $\sigma$ and the inverse temperature $\beta$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $\beta$-$\sigma$ coupling compared to decoupled choices.

Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100-N	Accuracy75.67	62
Out-of-Distribution Detection	CIFAR100 (ID) SVHN (OOD)	AUROC80.52	53
Out-of-Distribution Detection	CIFAR-10 vs SVHN	AUC0.9891	38
Image Classification	WebVision	Top-1 Acc73.95	21
Image Classification	CIFAR-100 standard (test)	Acc78.53	16
Image Classification	CIFAR-10N	Accuracy96.45	15
Image Classification	CIFAR10 standard (test)	Accuracy95.73	8

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord