Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization

About

Quantization is an effective approach to reduce the memory footprint and inference cost of large language models (LLMs), yet maintaining performance in the ultra-low-bit regime remains challenging. Existing post-training methods often suffer from severe accuracy degradation, while quantization-aware training requires costly retraining and additional resources. Moreover, most mixed-precision strategies rely on coarse-grained or heuristic sensitivity analysis that overlooks fine-grained variations within weight matrices. We propose WINDQuant, a reinforcement-learning-based allocation controller for ultra-low-bit LLM quantization. Rather than introducing another low-level quantization operator, WINDQuant learns how to assign bit-widths and quantization treatments to fine-grained column chunks under a global storage budget. By operating at the column-chunk level, WINDQuant enables flexible and fine-grained precision assignment within layers under a global target bit-width. The implementation combines PPO with activation-aware calibration, lightweight per-unit quantizer fitting, and explicit effective-bit accounting of the learned mixed-precision plan. Experiments on LLaMA models demonstrate that WINDQuant achieves competitive performance in ultra-low-bit settings while reducing optimization overhead relative to retraining-based approaches, highlighting reinforcement learning as a practical controller for adaptive mixed-precision quantization.

Phong Nam Huu Nguyen, Khoi M. Le, Cong-Duy T Nguyen, Anh Tuan Luu, Thong Thanh Nguyen, Tho Quan• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)--
2333
Commonsense ReasoningCommonsense Reasoning Suite (ARC-c, BoolQ, PIQA, HellaSwag, WinoGrande) zero-shot
ARC-c Accuracy49.9
35
Zero-shot EvaluationEvaluation Benchmarks Zero-shot
Average Accuracy70.2
34
Zero-shot EvaluationZero-shot Evaluation Suite--
14
Showing 4 of 4 rows

Other info

Follow for update