#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

About

Foundation language models obtain the instruction-following ability through supervised fine-tuning (SFT). Diversity and complexity are considered critical factors of a successful SFT dataset, while their definitions remain obscure and lack quantitative analyses. In this work, we propose InsTag, an open-set fine-grained tagger, to tag samples within SFT datasets based on semantics and intentions and define instruction diversity and complexity regarding tags. We obtain 6.6K tags to describe comprehensive user queries. Then we analyze popular open-sourced SFT datasets and find that the model ability grows with more diverse and complex data. Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data. The resulting models, TagLM, outperform open-source models based on considerably larger SFT data evaluated by MT-Bench, echoing the importance of query diversity and complexity. We open-source InsTag in https://github.com/OFA-Sys/InsTag.

Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou• 2023

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	MMLU Accuracy64.82	456
Commonsense Reasoning	HellaSwag	HellaSwag Score84.28	53
Science Question Answering	ARC-C	ARC-C Score62.21	43
Mathematical Reasoning	gsm	GSM Accuracy60.65	27
Instruction Following	Tulu3 Evaluation Suite pool (test)	ARC85.42	25
Code Generation	Codex	CodeX Score48.35	20
Truthfulness	TruthfulQA	TruthfulQA Score63.04	20
Aggregate performance evaluation	MMLU, GSM, HellaSwag, TruthfulQA, ARC-C, CodeX	Improvement0.96	18

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord