Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

In-Context Learning Creates Task Vectors

About

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

Roee Hendel, Mor Geva, Amir Globerson• 2023

Related benchmarks

TaskDatasetResultRank
Text ClassificationAG-News
Accuracy57.9
248
Topic ClassificationAG-News
Accuracy58.9
173
Commonsense Question AnsweringCommonsenseQA
Accuracy22
81
Semantic Antonym PredictionAntonym
Accuracy65.7
44
Machine TranslationEnglish-French
Accuracy73.8
42
Sentiment ClassificationSentiment classification
Acc77.1
32
Knowledge Retrieval / Relation PredictionPerson-Instrument
Accuracy0.706
30
Named Entity RecognitionNER person
Accuracy0.626
26
Named Entity RecognitionNER location
Accuracy41.9
26
Named Entity RecognitionNER organization
Accuracy51.1
26
Showing 10 of 14 rows

Other info

Follow for update