From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
About
In the realm of Large Language Models (LLMs), the balance between instruction data quality and quantity is a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability. Through the application of IFD, cherry samples can be pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on datasets like Alpaca and WizardLM underpin our findings; with a mere $10\%$ of original data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the instruction tuning of LLMs, promising both efficiency and resource-conscious advancements. Codes, data, and models are available: https://github.com/tianyi-lab/Cherry_LLM
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity10.17 | 2839 | |
| Language Modeling | WikiText-2 (test) | PPL18.54 | 1949 | |
| Commonsense Reasoning | HellaSwag | Accuracy75.59 | 1891 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)13.51 | 1624 | |
| Object Hallucination Evaluation | POPE | Accuracy82.6 | 1455 | |
| Visual Question Answering | VQA v2 | Accuracy74 | 1362 | |
| Visual Question Answering | TextVQA | Accuracy51.8 | 1285 | |
| Commonsense Reasoning | WinoGrande | Accuracy68.63 | 1085 | |
| Language Modeling | PTB | Perplexity19.31 | 1034 | |
| Text-based Visual Question Answering | TextVQA | Accuracy51.8 | 807 |