Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization

About

We introduce Qronos -- a new state-of-the-art post-training quantization algorithm that sequentially rounds and updates neural network weights. Qronos not only explicitly corrects errors due to both weight and activation quantization, but also errors resulting from quantizing previous layers. Our iterative algorithm is based on an interpretable and disciplined optimization framework that subsumes and surpasses existing data-driven approaches. At each step, Qronos alternates between error correction and diffusion via optimal update rules. Importantly, we prove that Qronos admits an efficient implementation that uses the Cholesky decomposition for solving least-squares problems. We also demonstrate that Qronos is compatible with existing transformation techniques such as Hadamard-based incoherence processing and weight-activation scaling equalization, among others. We evaluate Qronos using recent autoregressive language generation models in the Llama3 family; Qronos consistently outperforms previous state-of-the-art adaptive rounding methods when quantizing the weights, activations, and/or KV caches.

Shihao Zhang, Haoyu Zhang, Ian Colbert, Rayan Saab• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity7.1
1875
Zero-shot EvaluationDownstream Tasks Zero-shot
Accuracy72.2
278
Zero-shot ReasoningARC-e, Winogrande, HellaSwag, PIQA
Normalized Avg Accuracy48
36
Showing 3 of 3 rows

Other info

Follow for update