Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pre-Training Transformers as Energy-Based Cloze Models

About

We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning• 2020

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy91.1
416
ASR rescoringLibriSpeech clean (test)
WER5.65
21
ASR rescoringLibriSpeech (test-other)
WER17.42
21
Showing 3 of 3 rows

Other info

Follow for update