Nonparametric Masked Language Modeling

About

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer• 2022

Related benchmarks

Task	Dataset	Result
Natural Language Inference	RTE	Accuracy61.7	590
Subjectivity Classification	Subj	Accuracy75.5	343
Text Classification	AG-News	Accuracy74.5	248
Sentiment Classification	SST-2	Accuracy87.2	190
Sentiment Classification	MR	Accuracy83.7	148
Sentiment Classification	CR	Accuracy81.2	142
Text Classification	AGNews	Accuracy74.5	43
Topic Classification	Yahoo Answers Topics	Accuracy53.9	26
Sentiment Analysis	Rotten Tomato	Accuracy86	25
Open-set knowledge retrieval	T-REx (All)	Macro-averaged EM34.5	19

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord