SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

About

Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs). However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise. Our approach leverages the observation that important weights follow a structured distribution and introduces two key components: \textbf{1)} \textit{Salience-Determined Bit Allocation} adaptively assigns bit-widths to groups within each layer based on their salience; and \textbf{2)} \textit{Salience-Weighted Quantizer Calibration} optimizes quantizer parameters by incorporating element-level salience. With its structured partitioning, SliM-LLM provides a hardware-friendly solution that matches the efficiency of uniform quantization methods while improving accuracy. Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths. For example, a 2-bit quantized LLaMA-7B model reduces memory usage by nearly 6x compared to the floating-point baseline, decreases perplexity by 48\% compared to state-of-the-art gradient-free PTQ methods, and maintains GPU inference speed. Additionally, the extended version, SliM-LLM$^+$, which incorporates gradient-based quantization, further reduces perplexity by 35.1\%. Our code is available at https://github.com/Aaronhuang-778/SliM-LLM

Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Qinshuo Liu, Xianglong Liu, Luca Benini, Michele Magno, Shiming Zhang, Xiaojuan Qi• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)4.08	2320
Language Modeling	C4 (val)	PPL8.4	737
Question Answering	PIQA	Accuracy61.7	505
Language Modeling	WikiText2 v1 (test)	Perplexity6.07	383
Sentence Completion	HellaSwag	Accuracy44.1	364
Mathematical Reasoning	MathQA	Accuracy23.62	354
Multi-task Language Understanding	MMLU	Accuracy25.43	353
Science Question Answering	ARC-C	Accuracy28.24	261
Science Question Answering	ARC-E	Accuracy49.07	240
Reading Comprehension	BoolQ	Accuracy (BoolQ)66.18	228

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord