Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Post Training Quantization of Large Language Models with Microscaling Formats

About

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of three well-known post-training techniques, SmoothQuant, AWQ, and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of these methods by enabling quantization to microscaling (MX) formats, extending the applicability of these PTQ algorithms beyond their original fixed-point format targets. We show that combining different PTQ methods enables us to quantize models to 4-bit weights and 8-bit activations using the MXINT format with negligible accuracy loss compared to the uncompressed baseline.

Sayeh Sharify, Utkarsh Saxena, Zifei Xu, Wanzin Yazar, Ilya Soloveychik, Xin Wang• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText
PPL18.9
479
ReasoningARC Easy
Accuracy71.13
183
ReasoningHellaSwag (HS)
HellaSwag Accuracy63.33
142
Common Sense ReasoningBoolQ
Accuracy76.73
131
ReasoningARC Challenge
Accuracy44.8
70
Showing 5 of 5 rows

Other info

Follow for update