Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Flatness-Aware Stochastic Gradient Langevin Dynamics

About

Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD targets a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $\sigma$ and the inverse temperature $\beta$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $\beta$-$\sigma$ coupling compared to decoupled choices.

Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100-N
Accuracy75.67
62
Out-of-Distribution DetectionCIFAR-10 vs SVHN
AUC0.9891
38
Out-of-Distribution DetectionCIFAR100 (ID) SVHN (OOD)
AUROC80.52
36
Image ClassificationWebVision
Top-1 Acc73.95
21
Image ClassificationCIFAR-100 standard (test)
Acc78.53
16
Image ClassificationCIFAR-10N
Accuracy96.45
15
Image ClassificationCIFAR10 standard (test)
Accuracy95.73
8
Showing 7 of 7 rows

Other info

Follow for update