Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

About

Recently, inspired by the concept of sparsity, Mixture-of-Experts (MoE) models have gained increasing popularity for scaling model size while keeping the number of activated parameters constant. In this study, we thoroughly investigate the sparsity of the dense LLaMA model by constructing MoE for both the attention (i.e., Attention MoE) and MLP (i.e., MLP MoE) modules in the transformer blocks. Specifically, we investigate different expert construction methods and granularities under the same activation conditions to analyze the impact of sparsifying the model. Additionally, to comprehensively evaluate the model's capabilities across various domains (e.g., conversation, code, math) after sparsification, we apply sparsity to the instructed large language models (LLMs) and construct instructed MoE models. To counteract the performance degradation resulting from increased sparsity, we design a two-stage post-training strategy to enhance model performance. Experiments on the LLaMA3 model demonstrate the potential effectiveness of this approach for future developments of instructed MoE models. The source codes and models are available at: \url{https://github.com/OpenSparseLLMs/LLaMA-MoE-v2}.

Xiaoye Qu, Daize Dong, Xuyang Hu, Tong Zhu, Weigao Sun, Yu Cheng• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy56.1
776
Code GenerationHumanEval (test)--
444
Physical Interaction Question AnsweringPIQA
Accuracy67.9
323
Language UnderstandingMMLU 5-shot (test)--
149
Language UnderstandingMMLU 5-shot--
132
Science Question AnsweringARC Easy
Accuracy57
101
Logical reasoningLogiQA
Accuracy30.7
84
Instruction FollowingIFEval (test)
IFEval Score36
45
Science Question AnsweringSciQ
Normalized Accuracy88.8
44
Commonsense ReasoningHellaSwag 10-shot (test)
Accuracy53.7
34
Showing 10 of 13 rows

Other info

Follow for update