Towards Superior Quantization Accuracy: A Layer-sensitive Approach

About

Large Vision and Language Models have exhibited remarkable human-like intelligence in tasks such as natural language comprehension, problem-solving, logical reasoning, and knowledge retrieval. However, training and serving these models require substantial computational resources, posing a significant barrier to their widespread application and further research. To mitigate this challenge, various model compression techniques have been developed to reduce computational requirements. Nevertheless, existing methods often employ uniform quantization configurations, failing to account for the varying difficulties across different layers in quantizing large neural network models. This paper tackles this issue by leveraging layer-sensitivity features, such as activation sensitivity and weight distribution Kurtosis, to identify layers that are challenging to quantize accurately and allocate additional memory budget. The proposed methods, named SensiBoost and KurtBoost, respectively, demonstrate notable improvement in quantization accuracy, achieving up to 9% lower perplexity with only a 2% increase in memory budget on LLama models compared to the baseline.

Feng Zhang, Yanbin Liu, Weihua Li, Jie Lv, Xiaodan Wang, Quan Bai• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	C4	Perplexity8.41	1688
Commonsense Reasoning	PIQA	Accuracy72.48	757
Language Modeling	WikiText2	Perplexity6.62	277
Common Sense Reasoning	BoolQ	Accuracy78.13	240
Common Sense Reasoning	HellaSwag	Accuracy76.58	213
Common Sense Reasoning	WinoGrande	Accuracy73.12	189
Reasoning	PIQA	Accuracy75.97	164
Reasoning	ARC-C	Accuracy56.76	112
Commonsense Reasoning	TruthfulQA	Accuracy28.35	28
Language Reasoning	TruthfulQA	Accuracy30.77	12

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord