Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

About

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Elias Frantar, Dan Alistarh• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity20.97
1875
Language ModelingWikiText-2 (test)
PPL8.32
1541
Commonsense ReasoningHellaSwag
Accuracy52.7
1460
Visual Question AnsweringTextVQA--
1117
Visual Question AnsweringVizWiz
Accuracy50.05
1043
Language ModelingWikiText-2
Perplexity (PPL)8.2
841
Commonsense ReasoningWinoGrande
Accuracy50.91
776
Language UnderstandingMMLU
Accuracy33.3
756
Question AnsweringARC Challenge
Accuracy38.23
749
Language ModelingPTB
Perplexity38.05
650
Showing 10 of 84 rows
...

Other info

Code

Follow for update