The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

About

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	--	2862
Language Modeling	C4	Perplexity9.8	1688
Commonsense Reasoning	WinoGrande	Accuracy59.3	1581
Language Modeling	C4	Perplexity11.06	1565
Language Modeling	PTB	Perplexity85	1234
Question Answering	ARC Challenge	Accuracy (ARC)25.77	631
Question Answering	PIQA	Accuracy71.5	589
Commonsense Reasoning	PIQA	Accuracy53.21	400
Language Modeling	Wiki2	PPL10	382
Question Answering	OBQA	Accuracy61.5	347

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord