Instruction Mining: Instruction Data Selection for Tuning Large Language Models

About

Large language models (LLMs) are initially pretrained for broad capabilities and then finetuned with instruction-following datasets to improve their performance in interacting with humans. Despite advances in finetuning, a standardized guideline for selecting high-quality datasets to optimize this process remains elusive. In this paper, we first propose InstructMining, an innovative method designed for automatically selecting premium instruction-following data for finetuning LLMs. Specifically, InstructMining utilizes natural language indicators as a measure of data quality, applying them to evaluate unseen datasets. During experimentation, we discover that double descent phenomenon exists in large language model finetuning. Based on this observation, we further leverage BlendSearch to help find the best subset among the entire dataset (i.e., 2,532 out of 100,000). Experiment results show that InstructMining-7B achieves state-of-the-art performance on two of the most popular benchmarks: LLM-as-a-judge and Huggingface OpenLLM leaderboard.

Yihan Cao, Yanbin Kang, Chi Wang, Lichao Sun• 2023

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity12.02	3785
Language Modeling	WikiText-2 (test)	PPL17.83	2333
Language Modeling	WikiText-2	Perplexity (PPL)13.67	2320
Commonsense Reasoning	HellaSwag	Accuracy75.51	1896
Commonsense Reasoning	WinoGrande	Accuracy68.56	1442
Language Modeling	PTB	Perplexity21.51	1234
Commonsense Reasoning	PIQA	Accuracy77.58	757
Language Modeling	PTB (test)	Perplexity28.87	543
Question Answering	ARC-E	Accuracy66.96	523
Question Answering	OBQA	Accuracy37.9	347

Showing 10 of 29 rows

Other info

Follow for update

@wizwand_team Discord