Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAFE: Finding Sparse and Flat Minima to Improve Pruning

About

Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.

Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy52.15
1891
Language ModelingC4
Perplexity7.82
1071
Question AnsweringARC Challenge
Accuracy38.14
906
Question AnsweringARC Easy
Accuracy72.14
597
Natural Language InferenceRTE
Accuracy57.04
448
Question AnsweringBoolQ
Accuracy74.83
317
Language ModelingWiki
Perplexity (PPL)5.73
281
Question AnsweringOpenBookQA
Accuracy26
126
Commonsense ReasoningWinoGrande
Accuracy66.77
68
Zero-shot AccuracyARC Easy
Zero-shot Acc (ARC Easy)66.84
63
Showing 10 of 23 rows

Other info

Follow for update