Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

About

The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clipping technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly surpasses existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks. Notably, BitDistiller is shown to be more cost-effective, demanding fewer data and training resources. The code is available at https://github.com/DD-DuDa/BitDistiller.

Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.97
1875
Mathematical ReasoningGSM8K
Accuracy51.02
983
Code GenerationHumanEval
Pass@136.58
850
Multi-task Language UnderstandingMMLU--
842
Code GenerationHumanEval @WizardCoder (test)
Pass@169.51
45
Mathematical ReasoningGSM8K @MetaMath (test)
Accuracy69.69
31
Language ModelingWikitext 2 Llama 2 & 3 (test)
PPL (Llama 2, Config 7)5.97
16
General Language UnderstandingGeneral Language Tasks Suite (WikiText-2, MMLU, PIQA, HellaSwag, WinoGrande, ARC-Challenge) standard (various)
PPL5.2
13
Language Understanding and ReasoningMMLU, PIQA, HellaSwag, WinoGrande, ARC-Challenge
MMLU (5s)43.65
13
LLM QuantizationLlama-2-70B
GPU Hours (h)64
13
Showing 10 of 12 rows

Other info

Code

Follow for update