Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GenQA: Generating Millions of Instructions from a Handful of Prompts

About

Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a need for industrial-scale datasets. However, this scale necessitates a data generation process that is almost entirely automated. In this work, we study methods for generating large instruction datasets from a single prompt. With little human oversight, we get LLMs to write diverse sets of instruction examples ranging from simple completion tasks to complex multi-turn dialogs across a variety of subject areas. When finetuning a Llama-3 8B base model, our dataset meets or exceeds both WizardLM and Ultrachat on both knowledge-intensive leaderboard tasks as well as conversational evaluations. We release our dataset, the "generator" prompts that created it, and our finetuned model checkpoints.

Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, Tom Goldstein• 2024

Related benchmarks

TaskDatasetResultRank
Table Question AnsweringWTQ
Accuracy21.63
101
Table Question AnsweringHiTab
Accuracy57.14
67
Table Question AnsweringTabMWP
Accuracy41.06
53
Table Question AnsweringAIT-QA
Accuracy70.35
41
Table-based Fact VerificationTabFact
Accuracy39.38
33
Table SummarizationQTSumm
Accuracy76.9
24
Tabular UnderstandingTableGPT
Accuracy59.87
24
Table Question AnsweringTabMCQ
Accuracy55.01
24
Table ReasoningInfoTabs
Accuracy83.02
24
Table-to-text generationFeTaQA
Accuracy62.77
24
Showing 10 of 10 rows

Other info

Follow for update