Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WizardLM: Empowering large pre-trained language models to follow complex instructions

About

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed and Vicuna's testset show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM are preferred to outputs from OpenAI ChatGPT. In GPT-4 automatic evaluation, WizardLM achieves more than 90\% capacity of ChatGPT on 17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing LLMs. Our code and data are public at https://github.com/nlpxucan/WizardLM

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Qingwei Lin, Daxin Jiang• 2023

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Pass@17.44e+3
1036
ReasoningBBH--
672
Instruction FollowingIFEval--
625
Instruction FollowingAlpacaEval 2.0
Win Rate8.5
507
Mathematical ReasoningGSM8K
Accuracy64.37
499
Multi-turn Dialogue EvaluationMT-Bench
Overall Score7.71
447
Mathematical Problem SolvingMATH
Accuracy31.94
229
Instruction FollowingMT-Bench
MT-Bench Score5.68
215
Question AnsweringARC-C
Accuracy59.77
192
Table Question AnsweringHiTab
Accuracy45.2
121
Showing 10 of 55 rows

Other info

Follow for update