Active Example Selection for In-Context Learning

About

With a handful of demonstration examples, large-scale language models show strong capability to perform various tasks by in-context learning from these examples, without any fine-tuning. We demonstrate that in-context learning performance can be highly unstable across samples of examples, indicating the idiosyncrasies of how language models acquire information. We formulate example selection for in-context learning as a sequential decision problem, and propose a reinforcement learning algorithm for identifying generalizable policies to select demonstration examples. For GPT-2, our learned policies demonstrate strong abilities of generalizing to unseen tasks in training, with a $5.8\%$ improvement on average. Examples selected from our learned policies can even achieve a small improvement on GPT-3 Ada. However, the improvement diminishes on larger GPT-3 models, suggesting emerging capabilities of large language models.

Yiming Zhang, Shi Feng, Chenhao Tan• 2022

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy86.83	1896
Natural Language Inference	RTE	Accuracy47.5	590
Intent Classification	Banking77 (test)	Accuracy84.2	196
Natural Language Inference	SNLI	Accuracy35	196
Sentiment Analysis	SST-5	Accuracy43.69	123
Commonsense Question Answering	CommonsenseQA	Accuracy87.55	92
Sentiment Analysis	Sent140	Accuracy69.1	79
Natural Language Inference	QNLI	Accuracy61.5	78
Paraphrase Detection	MRPC	Accuracy64.2	70
Natural Language Inference	MNLI	Accuracy70.92	36

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord