Finetuned Language Models Are Zero-Shot Learners

About

This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le• 2021

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	--	836
Instruction Following	AlpacaEval 2.0	--	722
Natural Language Inference	RTE	Accuracy79.9	590
Question Answering	OpenBookQA	Accuracy77.4	465
General Knowledge	MMLU	MMLU General Knowledge Accuracy67.7	307
Reading Comprehension	BoolQ	Accuracy83.6	279
Question Answering	ARC	Accuracy71	230
Mathematical Problem Solving	MATH	Accuracy51.7	229
Topic Classification	AG-News	Accuracy86.36	225
Natural Language Inference	SNLI	Accuracy62.3	196

Showing 10 of 103 rows

...

Other info

Code

Follow for update

@wizwand_team Discord