Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

About

As large language models continue to scale, low-bit weight-only post-training quantization (PTQ) offers a practical solution to their memory-efficient deployment. Although block-wise PTQ is capable of matching the full-precision (FP) baseline on basic language modeling and understanding, its quality is degraded for generative tasks -- especially at longer responses and extended chains of thought, which is critical in boosting task accuracy. We attribute this shortfall to two factors: (i) the omission of the unembedding layer (the LM head) in block-wise optimization and (ii) the reliance on the mean squared error (MSE) objective. Both factors cause the token probability distribution of the quantized model to misalign with that of the FP model, yielding notable accuracy drops on text generation benchmarks. To rectify the discrepancy, we introduce Logit-aware Final-block Quantization (LFQ), a simple yet effective enhancement to block-wise PTQ that quantizes the final Transformer block by minimizing the cross-entropy between the logits of the FP model and those of its quantized counterpart. By aligning token probabilities at the logit level in the final block, LFQ consistently improves the accuracy of complex generation tasks over state-of-the-art block-wise PTQ across diverse model families, while maintaining parity with FP baselines on language modeling and understanding.

Jung Hyun Lee, June Yong Yang, Jungwook Choi, Eunho Yang• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.62
3785
Language UnderstandingMMLU
MMLU Accuracy80.25
147
Instruction FollowingIFEval
IFEval Score78
87
Instruction FollowingIFEval
Avg. Score (IFEval)71.44
45
Language UnderstandingMMLU
MMLU Score66.97
40
Text GenerationGSM8K
Accuracy81.8
35
Text GenerationIFEval
Accuracy72.46
23
Mathematical ReasoningMATH 500
Accuracy (Avg@8)88.4
10
Instruction FollowingIFEval
Accuracy (Greedy)82.99
4
Mathematical ReasoningAIME 25
Greedy Accuracy60
4
Showing 10 of 10 rows

Other info

Follow for update