Improving language models by retrieving from trillions of tokens

About

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre• 2021

Related benchmarks

Task	Dataset	Result
Language Modeling	C4 (val)	--	737
Language Modeling	WikiText-103 (test)	Perplexity2.22	703
Medical Question Answering	MedMCQA	Accuracy63.3	521
Medical Question Answering	MedQA	Accuracy69.6	153
Question Answering	HotpotQA	F121.1	132
Language Modeling	LAMBADA (test)	--	109
Question Answering	2WikiMultihopQA	EM11.2	107
Question Answering	Natural Questions (NQ) (test)	Exact Match45.5	77
Question Answering	Natural Questions (test)	EM45.5	72
Medical Question Answering	MedExpQA	Overall Accuracy69.6	70

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord