LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

About

Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization through a novel three-stage quantization strategy. The framework proceeds as follows: (1) initialize a high-quality quantized model via PTQ; (2) quantize binarized weights, group-wise bitmaps, and quantization parameters through layer-wise distillation while keeping activations in full precision; and (3) training learnable activation quantization factors to dynamically quantize activations to 4 bits. This decoupled design mitigates interference between weight and activation quantization, yielding greater training stability and better inference accuracy. LBLLM, trained only using 0.016B tokens with a single GPU, surpasses existing state-of-the-art binarization methods on W2A4 quantization settings across tasks of language modeling, commonsense QA, and language understanding. These results demonstrate that extreme low-bit quantization of LLMs can be both practical and highly effective without introducing any extra high-precision channels or rotational matrices commonly used in recent PTQ-based works, offering a promising path toward efficient LLM deployment in resource-limited situations.

Siqing Song, Chuang Wang, Yong Lang, Yi Yang, Xu-Yao Zhang• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)9.08	2862
Language Modeling	WikiText-2 (test)	PPL9.18	2416
Language Modeling	C4	Perplexity10.42	1688
Language Modeling	PTB	Perplexity26.4	1234
Language Modeling	C4 (val)	PPL10.91	908
Language Modeling	Wiki	Perplexity (PPL)18.53	298
Common Sense Reasoning	BoolQ	Accuracy70.28	280
Reasoning	ARC Easy	Accuracy53.11	242
Multiple-choice Question Answering	HellaSwag	Accuracy58.54	212
Reasoning	HellaSwag (HS)	HellaSwag Accuracy60.05	209

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord