Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improving language models by retrieving from trillions of tokens

About

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre• 2021

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (test)
Perplexity2.22
524
Language ModelingC4 (val)--
392
Question AnsweringHotpotQA
F121.1
114
Question Answering2WikiMultihopQA
EM11.2
73
Question AnsweringNatural Questions (test)
EM45.5
72
Language ModelingLAMBADA (test)
Accuracy73
71
Open-domain Question AnsweringNaturalQ-Open (test)
EM45.5
37
Question AnsweringNatural Questions (NQ) (test)
Exact Match45.5
35
Open-ended generationWikiText-103 (test)
MAUVE0.2286
26
Open-ended Text GenerationLaw-MT Out of Domain (test)
MAUVE20.35
16
Showing 10 of 15 rows

Other info

Follow for update