LLaMA Pro: Progressive LLaMA with Block Expansion

About

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, Ping Luo• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy77.94	1896
Code Generation	HumanEval	Pass@144.51	1043
Multi-task Language Understanding	MMLU	Accuracy52.57	881
Language Modeling	WikiText-103 (test)	Perplexity7.81	703
Multi-turn Dialogue Evaluation	MT-Bench	Overall Score6.32	532
Question Answering	ARC-E	Accuracy28.92	523
Commonsense Reasoning	WinoGrande	Accuracy73.95	453
Boolean Question Answering	BoolQ	Accuracy64.86	350
Question Answering	BoolQ	Accuracy68.1	317
Question Answering	ARC-C	Accuracy23.73	258

Showing 10 of 58 rows

Other info

Code

Follow for update

@wizwand_team Discord