Direct Output Connection for a High-Rank Language Model
About
This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.
Sho Takase, Jun Suzuki, Masaaki Nagata• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL53.09 | 1541 | |
| Language Modeling | Penn Treebank (test) | Perplexity47.17 | 411 | |
| Language Modeling | WikiText2 v1 (test) | Perplexity58.01 | 341 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)54.91 | 277 | |
| Language Modeling | Penn Treebank (val) | Perplexity54.12 | 178 | |
| Constituent Parsing | PTB (test) | F194.47 | 127 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity52.4 | 120 | |
| Language Modeling | PTB (val) | Perplexity48.63 | 83 | |
| Text Summarization | Gigaword (test) | ROUGE-146.99 | 75 | |
| Language Modeling | Penn Treebank (PTB) (val) | Perplexity54.1 | 70 |
Showing 10 of 16 rows