Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

About

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and achieves enhanced accuracy in tasks like common-sense QA and arithmetic reasoning as well as natural language understanding. Our code is available at https://github.com/aiha-lab/TSLD.

Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingPTB
Perplexity11
650
Language ModelingPTB (test)
Perplexity11.6
471
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy94.05
416
Arithmetic ReasoningGSM8K
ACC26.23
10
Common-sense QAPIQA
Accuracy75.62
10
Common-sense QAOpenBookQA
Accuracy46.81
10
Common-sense QAARC Easy
Accuracy59.39
4
Common-sense QAARC Challenge
Accuracy33.45
4
Showing 8 of 8 rows

Other info

Code

Follow for update