Layer-adaptive sparsity for the Magnitude-based Pruning

About

Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: https://github.com/jaeho-lee/layer-adaptive-sparsity

Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, Jinwoo Shin• 2020

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy74.74	1581
Natural Language Inference	RTE	Accuracy62.45	590
Question Answering	OBQA	Accuracy35.4	347
Language Modeling	WikiText	Word Perplexity4.98	331
Science Question Answering	ARC-C	Accuracy49.57	268
Science Question Answering	ARC-E	Accuracy76.4	240
Question Answering	BoolQ	Accuracy79.15	233
Image Classification	CIFAR100 (test)	Accuracy23.11	98
Question Answering	ARC-E	Accuracy (%)75.7	39
Question Answering	OBQA	Accuracy (Normalized)35	29

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord