Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

An Efficient Multilingual Language Model Compression through Vocabulary Trimming

About

Multilingual language model (LM) have become a powerful tool in NLP especially for non-English languages. Nevertheless, model parameters of multilingual LMs remain large due to the larger embedding matrix of the vocabulary covering tokens in different languages. On the contrary, monolingual LMs can be trained in a target language with the language-specific vocabulary only, but this requires a large budget and availability of reliable corpora to achieve a high-quality LM from scratch. In this paper, we propose vocabulary-trimming (VT), a method to reduce a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. In theory, VT can compress any existing multilingual LM to build monolingual LMs in any language covered by the multilingual LM. In our experiments, we show that VT can retain the original performance of the multilingual LM, while being smaller in size (in general around 50% of the original vocabulary size is enough) than the original multilingual LM. The evaluation is performed over four NLP tasks (two generative and two classification tasks) among four widely used multilingual LMs in seven languages. Finally, we show that this methodology can keep the best of both monolingual and multilingual worlds by keeping a small size as monolingual models without the need for specifically retraining them, and even limiting potentially harmful social biases.

Asahi Ushio, Yi Zhou, Jose Camacho-Collados• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag--
1891
Instruction FollowingIFEval--
625
Logical reasoningBBH
Accuracy37
201
Reading ComprehensionBoolQ
Score65
10
FactualityTruthfulQA gen
BLEU Acc36
5
Large Language Model InferenceLlama 3.2 1B
TPOTH1.07
4
Mathematical ReasoningGSM8K
Exact Match Accuracy46
4
Multitask Language UnderstandingMMLU-Pro
Exact Match Accuracy18
4
Showing 8 of 8 rows

Other info

Follow for update