Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

About

With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required for training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR employs a two-step process: first, it ranks instruction pairs using a high-accuracy (84.25%) scoring model aligned with expert preferences; second, it preserves dataset diversity through clustering. In our experiment, CaR efficiently selected a mere 1.96% of Alpaca's IT data, yet the resulting AlpaCaR model surpassed Alpaca's performance by an average of 32.1% in GPT-4 evaluations. Moreover, we find that data selecting is a consistent paradigm whether the pre-trained model is more capable or the model parameters scaling up. Our approach employs compact models with 550M parameters and incurs just 11.2% of the financial outlay of current methods, enhancing its industrial deployability.

Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Boxing Chen, Hao Yang, Bei Li, Tong Xiao, Jingbo Zhu• 2024

Related benchmarks

TaskDatasetResultRank
Instruction FollowingMT-Bench
MT-Bench Score6.58
189
Faithfulness HallucinationFollowRAG Faithfulness+
Faithfulness (NaturalQA)45.5
60
Instruction FollowingMT-bench v1.0 (test)
MT-Bench Score61.2
52
Instruction FollowingFollowRAG Instruction
FollowRAG Instruction Score42.3
30
Instruction FollowingFollowRAG Instruction v1 (test)
FollowRAG Instruction Score40.5
30
Factuality HallucinationLongFact
Facts Score21.1
30
Factuality Hallucination EvaluationLongFact (test)
Response Score100
30
Factuality Hallucination EvaluationBioGEN (test)
FactScore47.9
30
Factuality HallucinationBioGEN
FactScore45.7
30
Instruction FollowingTulu3 Evaluation Suite pool (test)
ARC91.86
25
Showing 10 of 10 rows

Other info

Follow for update