An Analysis of Neural Language Modeling at Multiple Scales
About
Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.
Stephen Merity, Nitish Shirish Keskar, Richard Socher• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-103 (test) | Perplexity33 | 524 | |
| Language Modeling | Penn Treebank (test) | Perplexity58.8 | 411 | |
| Character-level Language Modeling | enwik8 (test) | BPC1.23 | 195 | |
| Language Modeling | WikiText-103 (val) | PPL32 | 180 | |
| Character-level Language Modeling | Penn Treebank (test) | BPC1.175 | 113 | |
| Word-level Language Modeling | WikiText-103 word-level (test) | Perplexity33 | 65 | |
| Word-level Language Modeling | WikiText-103 (dev) | Perplexity32 | 64 | |
| Character-level Prediction | PTB (test) | BPC (Test)1.175 | 42 | |
| OOD Detection | Penn Treebank (PTB) Characters (test) | FPR9099 | 2 |
Showing 9 of 9 rows