Exploring the Limits of Language Modeling
About
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | One Billion Word Benchmark (test) | Test Perplexity23.7 | 108 | |
| Language Modeling | 1 Billion Word Language Modeling Benchmark holdout (test) | Test Perplexity (10 epochs)34.7 | 14 | |
| Language Modeling | Billion-Word Benchmark (dev) | Word-Perplexity23.7 | 11 | |
| Language Modeling | One Billion Word Benchmark | Perplexity35.1 | 10 | |
| Language Modeling | 100 Billion Word Google News Dataset (test) | Test Perplexity (0.1 epochs)67.1 | 9 | |
| Language Modeling | 1 Billion Word Benchmark 1.0 (test) | Test Perplexity (10 epochs)34.7 | 4 |