Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

About

Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference by stacking binary ($\pm$1) layers, but is plagued by pathological feature co-adaptation. We identify a key failure mode, which we term inter-path adaptation: during quantization-aware training (QAT), parallel residual binary paths learn redundant features, degrading the error-compensation structure and limiting the expressive capacity of the model. While prior work relies on heuristic workarounds (e.g., path freezing) that constrain the solution space, we propose RaBiT, a novel quantization framework that resolves co-adaptation by algorithmically enforcing a residual hierarchy. Its core mechanism sequentially derives each binary path from a single shared full-precision weight, which ensures that every path corrects the error of the preceding one. This process is stabilized by a robust initialization that prioritizes functional preservation over mere weight approximation. RaBiT redefines the 2-bit accuracy-efficiency frontier: it achieves state-of-the-art performance, rivals even hardware-intensive Vector Quantization (VQ) methods, and delivers a $4.49\times$ inference speed-up over full-precision models on an RTX 4090.

Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL6.66
1541
Language ModelingC4
Perplexity6.51
1182
Language ModelingWikiText-2
Perplexity (PPL)4.84
841
ReasoningBBH
Accuracy37.72
507
Language ModelingC4 (val)
PPL10.18
392
Instruction FollowingIFEval--
292
Question AnsweringGPQA
Accuracy28.62
258
Multitask Language UnderstandingMMLU-Pro
Accuracy19.65
99
Question AnsweringQA Zero-shot Average
QA Zero-shot Average68.85
57
Showing 9 of 9 rows

Other info

Follow for update