Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

About

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2--
2320
Language ModelingC4
Perplexity9.8
1688
Language ModelingC4
Perplexity11.06
1565
Commonsense ReasoningWinoGrande
Accuracy59.3
1442
Language ModelingPTB
Perplexity85
1234
Question AnsweringARC Challenge
Accuracy (ARC)25.77
598
Question AnsweringPIQA
Accuracy71.5
505
Question AnsweringOBQA
Accuracy61.5
347
Language ModelingWiki2
PPL10
326
Question AnsweringOpenBookQA
Accuracy26.4
305
Showing 10 of 25 rows

Other info

Follow for update