Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

About

1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes naive adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.

Zhijun Tu, Jian Li, Yuanyuan Xi, Siqi Liu, Chuanjian Liu, Hanting Chen, Jie Hu, Yunhe Wang• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity17.1
1688
Commonsense ReasoningWinoGrande
Accuracy66.4
1442
Language ModelingPTB
Perplexity20.4
1234
Question AnsweringPIQA
Accuracy72.7
505
Question AnsweringOBQA
Accuracy42
347
Language ModelingWiki2
PPL12.4
326
Language ModelingWikiText2
Perplexity12.4
277
Question AnsweringBoolQ
Accuracy62.2
201
Commonsense ReasoningHellaSwag (HS)
HS Accuracy58.3
66
Question AnsweringARC-E
Normalized Accuracy (ARC-E)63.1
59
Showing 10 of 13 rows

Other info

Follow for update