Knowledge Fusion of Large Language Models

About

While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. By leveraging the generative distributions of source LLMs, we externalize their collective knowledge and unique strengths, thereby potentially elevating the capabilities of the target model beyond those of any individual source LLM. We validate our approach using three popular LLMs with different architectures--Llama-2, MPT, and OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseLLM}.

Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi• 2024

Related benchmarks

Task	Dataset	Result
Science Question Answering	ARC Challenge	Accuracy14.58	354
Mathematical Reasoning	AIME	AIME Accuracy0.00e+0	288
Reasoning	HellaSwag (HS)	HellaSwag Accuracy87.81	209
General Reasoning	BBH	Accuracy77.62	190
General Reasoning	MMLU	MMLU Accuracy83.92	180
Instruction Following	UnNI	Rouge-L17.25	178
Code Generation	HumanEval	Pass@118.4	171
Science Question Answering	ARC Easy	Accuracy21.52	162
Knowledge	MMLU	Accuracy80.62	161
Coding	MBPP	Accuracy79.28	145

Showing 10 of 78 rows

...

Other info

Follow for update

@wizwand_team Discord