Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WaterSIC: information-theoretically (near) optimal linear layer quantization

About

This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT). It is shown that a popular GPTQ algorithm may have an arbitrarily large gap to the IT limit. To alleviate this problem, a novel algorithm, termed ''WaterSIC'', is proposed and is shown to be within a rate gap of 0.255 bits to the IT limit, uniformly over all possible covariance matrices of input activations. The key innovation of WaterSIC's is to allocate different quantization rates to different columns (in-features) of the weight matrix, mimicking the classical IT solution known as ''waterfilling''. Applying WaterSIC to the Llama and Qwen family of LLMs establishes new state-of-the-art performance for all quantization rates from 1 to 4 bits.

Egor Lifar, Semyon Savkin, Or Ordentlich, Yury Polyanskiy• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)9.79
2320
Commonsense ReasoningHellaSwag
Accuracy62.87
1896
Question AnsweringARC Challenge--
906
Commonsense ReasoningPIQA
Accuracy74.97
757
Physical Commonsense ReasoningPIQA
Accuracy76.99
696
Question AnsweringARC Easy
Accuracy61.7
597
Commonsense ReasoningWinoGrande
Accuracy70.09
453
Multi-task Language UnderstandingMMLU
Accuracy72.77
353
Question AnsweringSciQ--
283
Commonsense ReasoningSocialIQA
Accuracy43.4
158
Showing 10 of 13 rows

Other info

Follow for update