In-Context Learning Creates Task Vectors

About

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

Roee Hendel, Mor Geva, Amir Globerson• 2023

Related benchmarks

Task	Dataset	Result
Subjectivity Classification	Subj	Accuracy61.12	343
Text Classification	TREC	Accuracy74.12	281
Multitask Language Understanding	MMLU-Pro	Accuracy31.6	248
Text Classification	AG-News	Accuracy57.9	248
Topic Classification	AG-News	Accuracy58.9	225
Text Classification	MR	Accuracy92.36	174
Topic Classification	DBpedia	Accuracy83.24	131
Text Classification	SST-5	Accuracy34.76	119
Text Classification	AGNews	Accuracy81.36	110
Natural Language Inference	aNLI	Accuracy33.17	107

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord