Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Optimal Formats for Weight Quantisation

About

Weight quantisation is an essential technique for enabling efficient training and deployment of modern deep learning models. However, the recipe book of quantisation formats is large and formats are often chosen empirically. In this paper, we propose a framework for systematic design and analysis of quantisation formats. By connecting the question of format design with the classical quantisation theory, we show that the strong practical performance of popular formats comes from their ability to represent values using variable-length codes. We frame the problem as minimising the KL divergence between original and quantised model outputs under a model size constraint, which can be approximated by minimising the squared quantisation error, a well-studied problem where entropy-constrained quantisers with variable-length codes are optimal. We develop non-linear quantisation curves for block-scaled data across multiple distribution families and observe that these formats, along with sparse outlier formats, consistently outperform fixed-length formats, indicating that they also exploit variable-length encoding. Finally, by using the relationship between the Fisher information and KL divergence, we derive the optimal allocation of bit-widths to individual parameter tensors across the model's layers, saving up to 0.25 bits per parameter when applied to large language models.

Douglas Orr, Luka Ribar, Carlo Luschi• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy79.8
1460
Commonsense ReasoningWinoGrande
Accuracy72.2
776
Question AnsweringARC Challenge
Accuracy75.9
749
Question AnsweringOpenBookQA
Accuracy72.8
465
Question AnsweringARC Easy
Accuracy90.2
386
Boolean Question AnsweringBoolQ
Accuracy73.8
307
Reading ComprehensionBoolQ
Accuracy79.4
219
Question AnsweringCommonsenseQA
Accuracy65.4
143
Commonsense ReasoningSIQA
Accuracy61.5
96
Social Interaction Question AnsweringSIQA
Accuracy58.6
85
Showing 10 of 15 rows

Other info

Follow for update