Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An Analysis of Neural Language Modeling at Multiple Scales

About

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Stephen Merity, Nitish Shirish Keskar, Richard Socher• 2018

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (test)
Perplexity33
524
Language ModelingPenn Treebank (test)
Perplexity58.8
411
Character-level Language Modelingenwik8 (test)
BPC1.23
195
Language ModelingWikiText-103 (val)
PPL32
180
Character-level Language ModelingPenn Treebank (test)
BPC1.175
113
Word-level Language ModelingWikiText-103 word-level (test)
Perplexity33
65
Word-level Language ModelingWikiText-103 (dev)
Perplexity32
64
Character-level PredictionPTB (test)
BPC (Test)1.175
42
OOD DetectionPenn Treebank (PTB) Characters (test)
FPR9099
2
Showing 9 of 9 rows

Other info

Follow for update