Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

About

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter• 2024

Related benchmarks

Task	Dataset	Result
Mathematics	MATH	MATH Accuracy49.5	172
Reasoning	ARC-C	Accuracy (ARC-c)84.6	113
Mathematics	GSM8K	GSM8K Score87	87
Reasoning	BBH	BBH Score67.8	53
Coding	MBPP	Overall Average Score71.2	37
Dialogue	IFEval	IFEval77.9	34
Dialogue	AlpacaEval 2	AlpacaEval2 Score38.8	34
Coding	HumanEval	HumanEval70.6	28

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord