SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

About

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Elias Frantar, Dan Alistarh• 2023

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity5.84	3785
Language Modeling	WikiText-2 (test)	PPL8.32	2333
Language Modeling	WikiText-2	Perplexity (PPL)4.25	2320
Object Hallucination Evaluation	POPE	Accuracy88.21	2019
Commonsense Reasoning	HellaSwag	Accuracy52.7	1896
Visual Question Answering	VizWiz	Accuracy65.28	1820
Language Modeling	C4	Perplexity8.22	1688
Language Modeling	C4	Perplexity27.62	1565
Visual Question Answering	TextVQA	--	1453
Commonsense Reasoning	WinoGrande	Accuracy78.85	1442

Showing 10 of 188 rows

...

Other info

Code

Follow for update

@wizwand_team Discord