AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

About

Post-training weight-only quantization to 4 bits is widely used to reduce the memory and compute costs of large language model inference. Existing PTQ methods, such as AWQ and GPTQ, improve how weights are mapped onto a fixed 4-bit grid through scaling, clipping, or error compensation. To further improve accuracy, methods such as OmniQuant and QuIP\# uses gradient-assisted algorithms at the cost of hours of quantization time. In this work, we propose AAAC (Activation-Aware Adaptive Codebooks), a lightweight method for 4-bit LLM weight quantization. AAAC replaces the fixed scalar codebook used in standard quantization with two small learned scalar codebooks (64 bytes) per layer. Each group of weights selects the codebook that minimizes activation-weighted reconstruction error, encoding the choice in the unused sign bit of the group's positive scale and adding zero storage overhead. AAAC completes in 3--30 minutes on a single GPU, and adds no memory beyond the model itself. We evaluate against AWQ, GPTQ, IF4, GPTVQ, OmniQuant, SqueezeLLM, and QuIP\# across model families. AAAC outperforms baselines at orders-of-magnitude less quantization time.

Beshr IslamBouli, David Jin• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL4.94	2416
Language Modeling	C4 (val)	PPL10.18	908
Question Answering	ARC Challenge	Accuracy (ARC)61.9	631
Perplexity	C4	Perplexity9.84	137
Language Modeling	WikiText-2	Perplexity4.94	105
Physical Reasoning	PIQA	Accuracy82.5	90
General Reasoning	BIG-Bench Hard	--	68
Multistep Reasoning	MuSR	Accuracy51.6	53
Expert-Level Question Answering	GPQA	--	25

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord