SAFE: Finding Sparse and Flat Minima to Improve Pruning

About

Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.

Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy52.15	1896
Language Modeling	C4	Perplexity7.82	1688
Question Answering	ARC Challenge	Accuracy38.14	906
Question Answering	ARC Easy	Accuracy72.14	597
Natural Language Inference	RTE	Accuracy57.04	590
Question Answering	BoolQ	Accuracy74.83	317
Language Modeling	Wiki	Perplexity (PPL)5.73	298
Question Answering	OpenBookQA	Accuracy26	145
Commonsense Reasoning	WinoGrande	Accuracy66.77	68
Zero-shot Accuracy	ARC Easy	Zero-shot Acc (ARC Easy)66.84	67

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord