Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

About

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen• 2017

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL40.68
1949
Language ModelingWikiText-103 (test)
Perplexity29.2
579
Language ModelingPTB (test)
Perplexity38.04
526
Language ModelingPenn Treebank (test)
Perplexity47.69
411
Language ModelingWikiText2 (val)
Perplexity (PPL)42.4
387
Language ModelingWikiText2 v1 (test)
Perplexity40.68
383
Language ModelingWikiText-103 (val)
PPL29
214
Language ModelingPenn Treebank (val)
Perplexity48.3
178
Language ModelingPenn Treebank (PTB) (test)
Perplexity47.69
120
Language ModelingOne Billion Word Benchmark (test)
Test Perplexity37.1
113
Showing 10 of 22 rows

Other info

Code

Follow for update