Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

About

By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.

Yang Zhao, Hao Zhang, Xiuyuan Hu• 2022

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy70.33
1460
Question AnsweringOpenBookQA
Accuracy36.2
465
Natural Language InferenceRTE
Accuracy72.56
367
Boolean Question AnsweringBoolQ
Accuracy79.24
307
Science Question AnsweringARC Challenge
Accuracy43.94
234
Natural Language UnderstandingGLUE (test dev)
MRPC Accuracy92.39
81
Multiple-choice Question AnsweringMMLU
STEM Accuracy50.52
13
Linguistic AcceptabilityCOLA
Max Memory (MB)3.32e+3
5
Natural Language InferenceMNLI
Max Memory (MB)8.08e+3
5
Fine-tuningOpen-Platypus
Max Memory (MB)5.13e+4
4
Showing 10 of 10 rows

Other info

Follow for update