Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

About

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.

Ba-Hien Tran, Van Minh Nguyen• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity11.03
2839
Commonsense ReasoningHellaSwag
Accuracy65.6
1891
Language ModelingWikiText-2
Perplexity (PPL)5.14
1624
Language ModelingC4
Perplexity6.94
1422
Commonsense ReasoningWinoGrande
Accuracy61.7
1085
Language ModelingC4
Perplexity8.53
1071
Question AnsweringARC Challenge
Accuracy34.2
906
Question AnsweringARC-E
Accuracy44.8
416
Question AnsweringPIQA
Accuracy75
374
Language ModelingWiki2
PPL5.35
149
Showing 10 of 19 rows

Other info

Follow for update