Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective

About

Instruction-tuning datasets often contain substantial redundancy and low-quality samples, necessitating effective data selection methods. We propose an instruction data selection framework based on weighted in-context influence (wICI), which measures how effectively each candidate example reduces instruction-following difficulty for semantically related peers. Through systematic experiments, we address three key questions: what constitutes effective instruction tuning data from an in-context perspective, whether sample difficulty correlates with in-context influence, and how in-context influence translates to instruction tuning effectiveness. Experiments across multiple models and benchmarks demonstrate that our method consistently outperforms existing baselines under constrained data budgets, while empirically showing that sample difficulty negatively correlates with in-context influence.

Guangzeng Han, Xiaolei Huang• 2026

Related benchmarks

TaskDatasetResultRank
Instruction FollowingAlpacaEval 2.0
Win Rate7.5
722
Question AnsweringARC Challenge
Accuracy (ARC)58.98
598
General Knowledge EvaluationMMLU
MMLU Accuracy64.9
127
Instruction FollowingIFEval (test)
IFEval Score51.19
88
Question AnsweringMedQA (test)
Accuracy46.03
67
Question AnsweringMedMCQA (test)--
48
Question AnsweringMMLU Med
Accuracy65.33
34
Instruction FollowingWizardLM (test)
Score1.308
25
Multi-turn Chat EvaluationMT-Bench
MT-Bench Score5.28
20
Instruction FollowingAlpacaEval GPT-4 (test)
AlpacaEval Win Rate (GPT-4)1.261
18
Showing 10 of 10 rows

Other info

Follow for update