Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

About

Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution, significantly reducing memory and storage needs without sacrificing too much performance. In this study, we introduce SignRound, a method that leverages signed gradient descent (SignSGD) to optimize rounding values and weight clipping in just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), delivering exceptional results across 2 to 4 bits while minimizing tuning costs and avoiding additional inference overhead. For example, SignRound achieved absolute average accuracy improvements ranging from 6.91% to 33.22% at 2bits, as measured by the average zero-shot accuracy across 11 tasks. It also demonstrates strong generalization in recent models, achieving near-lossless 4-bit quantization in most scenarios. The source code is publicly available at https://github.com/intel/auto-round.

Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv, Yi Liu• 2023

Related benchmarks

TaskDatasetResultRank
Zero-shot EvaluationPIQA, WinoGrande, HellaSwag, ARC (Easy and Challenge), LAMBADA (test)
Average Accuracy67.7
90
Large Language Model Evaluation10 tasks average
Avg Accuracy69.01
50
Commonsense ReasoningCommonsense Reasoning LLaMA2-7B
Average Accuracy63.72
18
Common Sense Reasoning5 common-sense reasoning tasks Llama-2-13B
Accuracy66.68
15
Common Sense Reasoning5 common-sense reasoning tasks Llama-2-70B
Average Accuracy71.24
15
LLM QuantizationLlama-2-70B
GPU Hours (h)2.2
13
Showing 6 of 6 rows

Other info

Follow for update