Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

REALM: Retrieval-Augmented Language Model Pre-Training

About

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang• 2020

Related benchmarks

TaskDatasetResultRank
Open Question AnsweringNatural Questions (NQ) (test)
Exact Match (EM)40.4
134
Question AnsweringNQ (test)
EM Accuracy40.4
66
Open-domain Question AnsweringTriviaQA
EM55.8
62
Information RetrievalBEIR--
59
End-to-end Open-Domain Question AnsweringNQ (test)
Exact Match (EM)40.4
50
Open-domain Question AnsweringNatural Questions (NQ)
Exact Match (EM)40.4
46
Open-domain Question AnsweringWebQuestions (WQ) Open-QA (test)
Exact Match40.7
38
Open-domain Question AnsweringNaturalQ-Open (test)
EM40.4
37
Open-domain Question AnsweringNQ (Natural Questions)
EM40.4
33
Open Question AnsweringWEBQUESTIONS (test)--
27
Showing 10 of 19 rows

Other info

Follow for update