Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

About

Large language models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal parameter increase when guaranteeing the quality of approximation in function space compared to original sparse upcycling. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GPT-3.5. Our code is available at https://github.com/wuhy68/Parameter-Efficient-MoE.

Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy85.2	1896
Code Generation	HumanEval	Pass@148.8	1043
Multi-task Language Understanding	MMLU	Accuracy75.7	881
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy77.27	711
Mathematical Reasoning	MATH	Accuracy24	338
Question Answering	ARC	Accuracy53.58	230
Question Answering	CommonsenseQA	Accuracy80.46	150
Question Answering	ScienceQA	Accuracy94.33	96
Question Answering	Natural Questions	EM31.2	52
Language Understanding	MMLU	Accuracy51.07	43

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord