Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

About

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Elias Frantar, Dan Alistarh• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.84
3785
Language ModelingWikiText-2 (test)
PPL8.32
2333
Language ModelingWikiText-2
Perplexity (PPL)4.25
2320
Object Hallucination EvaluationPOPE
Accuracy88.21
2019
Commonsense ReasoningHellaSwag
Accuracy52.7
1896
Visual Question AnsweringVizWiz
Accuracy65.28
1820
Language ModelingC4
Perplexity8.22
1688
Language ModelingC4
Perplexity27.62
1565
Visual Question AnsweringTextVQA--
1453
Commonsense ReasoningWinoGrande
Accuracy78.85
1442
Showing 10 of 188 rows
...

Other info

Code

Follow for update