AlpaGasus: Training A Better Alpaca with Fewer Data

About

Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: https://lichang-chen.github.io/AlpaGasus/

Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin• 2023

Related benchmarks

Task	Dataset	Result
Science Question Answering	ScienceQA	--	791
Multimodal Evaluation	MME	--	727
Instruction Following	AlpacaEval 2.0	Win Rate4.91	722
Multimodal Understanding	SEED-Bench	--	516
Mathematical Reasoning	MathVista	Score23.9	474
Science Question Answering	ARC Challenge	Accuracy56.4	354
Mathematical Reasoning	SVAMP (test)	Accuracy36.67	293
Instruction Following	MT-Bench	--	287
Multimodal Evaluation	MMBench	MMB Score34.71	118
Question Answering	ARC Challenge	Normalized Accuracy49.91	105

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord