QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling
About
Scalar post-training quantizers discard pairwise coordinate structure within weight rows. We introduce QAM-W (Quadrature Amplitude Modulation for Weights), a codec that recovers this structure: each row is L2-normalized, block-Hadamard rotated, paired into 2D coordinates, and quantized against a single Lloyd-Max codebook trained on the unit circular Gaussian, with activation-aware per-channel scaling. In a cross-model study spanning five LLMs from four families (1.1B--13B parameters) and eight quantized configurations, the activation-aware variant at $\approx 5.5$ bpw stays within $\pm 0.4\%$ of BF16 WikiText-2 perplexity on every model, matching the SmoothQuant W8A8 quality envelope at $32\%$ fewer weight bits. Joint 2D coding outperforms polar (amplitude $\times$ phase) coding by 2--15~pp $\Delta$PPL at equal bitrate, and paired KL against BF16 tracks $\Delta$PPL\% at Spearman $\rho = 0.99$ across 37 (method, model) rows, consistent with a monotone composite bound from codec distortion to KL divergence. A 3.5~bpw variant is competitive on quantization-tolerant architectures. At strict 4~bpw, the rotated-codebook frontier method QTIP outperforms QAM-W; the contribution is the quality-preserving 5--6~bpw band.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL4.2612 | 2333 | |
| Language Modeling | Mistral-7B | -- | 24 | |
| Language Understanding | MMLU | -- | 21 | |
| Zero-shot Evaluation | LM Evaluation Harness PIQA, HellaSwag, COPA, RTE, OpenBookQA, LAMBADA-OpenAI | Average Score75.51 | 16 | |
| Language Modeling | WikiText-2 stride (test) | bpw3.504 | 8 | |
| Language Modeling | TinyLlama 1.1B | Delta PPL (%)0.4 | 8 | |
| Language Modeling | Qwen 3B 2.5 | ∆PPL (%)19.6 | 8 | |
| Language Modeling | Llama-2 7B base | Perplexity4.7664 | 7 |