Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Movement Pruning: Adaptive Sparsity by Fine-Tuning

About

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

Victor Sanh, Thomas Wolf, Alexander M. Rush• 2020

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE--
531
Natural Language UnderstandingGLUE (dev)--
518
Question AnsweringSQuAD v1.1 (dev)
F1 Score84.9
380
Image ClassificationImageNet-1K
Top-1 Accuracy82.1
158
Question AnsweringSQuAD
F187.6
134
Natural Language InferenceMNLI (matched)
Accuracy81.2
110
Natural Language InferenceMNLI
Accuracy (matched)82.5
80
Paraphrase IdentificationQQP
Accuracy91
78
Natural Language InferenceMNLI (mismatched)
Accuracy81.8
68
Natural Language InferenceMNLI (test)
Accuracy0.812
48
Showing 10 of 11 rows

Other info

Follow for update