Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
About
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.
Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL40.68 | 1541 | |
| Language Modeling | WikiText-103 (test) | Perplexity29.2 | 524 | |
| Language Modeling | PTB (test) | Perplexity38.04 | 471 | |
| Language Modeling | Penn Treebank (test) | Perplexity47.69 | 411 | |
| Language Modeling | WikiText2 v1 (test) | Perplexity40.68 | 341 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)42.4 | 277 | |
| Language Modeling | WikiText-103 (val) | PPL29 | 180 | |
| Language Modeling | Penn Treebank (val) | Perplexity48.3 | 178 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity47.69 | 120 | |
| Language Modeling | One Billion Word Benchmark (test) | Test Perplexity37.1 | 108 |
Showing 10 of 22 rows