Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Nonparametric Masked Language Modeling

About

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer• 2022

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceRTE
Accuracy61.7
367
Subjectivity ClassificationSubj
Accuracy75.5
266
Text ClassificationAG-News
Accuracy74.5
248
Sentiment ClassificationSST-2
Accuracy87.2
174
Sentiment ClassificationMR
Accuracy83.7
148
Sentiment ClassificationCR
Accuracy81.2
142
Text ClassificationAGNews
Accuracy74.5
28
Topic ClassificationYahoo Answers Topics
Accuracy53.9
26
Sentiment AnalysisRotten Tomato
Accuracy86
25
Open-set knowledge retrievalT-REx (All)
Macro-averaged EM34.5
19
Showing 10 of 22 rows

Other info

Code

Follow for update