Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

About

This paper reveals the phenomenon of parameter heterogeneity in large language models (LLMs). We find that a small subset of "cherry" parameters exhibit a disproportionately large influence on model performance, while the vast majority of parameters have minimal impact. This heterogeneity is found to be prevalent across different model families, scales, and types. Motivated by this observation, we propose CherryQ, a novel quantization method that unifies the optimization of mixed-precision parameters. CherryQ identifies and preserves the critical cherry parameters in high precision while aggressively quantizing the remaining parameters to low precision. Extensive experiments demonstrate the effectiveness of CherryQ. CherryQ outperforms existing quantization approaches in terms of perplexity and downstream task performance. Notably, our 3-bit quantized Vicuna-1.5 exhibits competitive performance compared to their 16-bit counterparts.

Wanyun Cui, Qianle Wang• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL5.21
1541
Language ModelingC4
Perplexity6.56
1182
Multi-task Language UnderstandingMMLU--
842
Language ModelingWikiText-2
Perplexity (PPL)4.99
841
Language ModelingC4 (val)
PPL6.76
392
Language Modeling and ReasoningOpen LLM Leaderboard
ARC58.6
33
Showing 6 of 6 rows

Other info

Follow for update