RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

About

Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference by stacking binary ($\pm$1) layers, but is plagued by pathological feature co-adaptation. We identify a key failure mode, which we term inter-path adaptation: during quantization-aware training (QAT), parallel residual binary paths learn redundant features, degrading the error-compensation structure and limiting the expressive capacity of the model. While prior work relies on heuristic workarounds (e.g., path freezing) that constrain the solution space, we propose RaBiT, a novel quantization framework that resolves co-adaptation by algorithmically enforcing a residual hierarchy. Its core mechanism sequentially derives each binary path from a single shared full-precision weight, which ensures that every path corrects the error of the preceding one. This process is stabilized by a robust initialization that prioritizes functional preservation over mere weight approximation. RaBiT redefines the 2-bit accuracy-efficiency frontier: it achieves state-of-the-art performance, rivals even hardware-intensive Vector Quantization (VQ) methods, and delivers a $4.49\times$ inference speed-up over full-precision models on an RTX 4090.

Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL6.66	1949
Language Modeling	WikiText-2	Perplexity (PPL)4.84	1624
Language Modeling	C4	Perplexity6.51	1422
Reasoning	BBH	Accuracy37.72	672
Instruction Following	IFEval	IFEval Accuracy24.63	625
Language Modeling	C4 (val)	PPL10.18	514
Question Answering	GPQA	Accuracy28.62	258
Multitask Language Understanding	MMLU-Pro	Accuracy19.65	118
Question Answering	QA Zero-shot Average	QA Zero-shot Average68.85	57

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord