Highly Efficient and Effective LLMs with Multi-Boolean Architectures

About

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.

Ba-Hien Tran, Van Minh Nguyen• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity11.03	3785
Language Modeling	WikiText-2	Perplexity (PPL)5.14	2320
Commonsense Reasoning	HellaSwag	Accuracy65.6	1896
Language Modeling	C4	Perplexity8.53	1688
Language Modeling	C4	Perplexity6.94	1565
Commonsense Reasoning	WinoGrande	Accuracy61.7	1442
Question Answering	ARC Challenge	Accuracy34.2	906
Question Answering	ARC-E	Accuracy44.8	523
Question Answering	PIQA	Accuracy75	505
Language Modeling	Wiki2	PPL5.35	326

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord