Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

About

Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used sharpness, this work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is data-responsive -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.

Bingcong Li, Liang Zhang, Niao He• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy59.72	1398
Natural Language Inference	SNLI (test)	Accuracy84.9	694
Natural Language Understanding	GLUE (test)	SST-2 Accuracy96	416
Sentiment Classification	SST2 (test)	Accuracy91.5	233
Sentiment Analysis	SST-5 (test)	Accuracy55	177
Question Classification	TREC (test)	Accuracy96.7	128
Natural Language Inference	RTE (test)	Accuracy81	52
Natural Language Inference	MNLI (test)	Accuracy0.783	52
Dialogue	MT-Bench	MT-Bench Score6.1	41
Instruction Following	BBH	--	40

Showing 10 of 20 rows

Other info

Code

Follow for update

@wizwand_team Discord