Dependency-based Mixture Language Models
About
Various models have been proposed to incorporate knowledge of syntactic structures into neural language models. However, previous works have relied heavily on elaborate components for a specific language model, usually recurrent neural network (RNN), which makes themselves unwieldy in practice to fit into other neural language models, such as Transformer and GPT-2. In this paper, we introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective to learn the probability distribution of future dependent tokens given context. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention. Extensive experiments and human evaluations show that our method can be easily and effectively applied to different neural language models while improving neural text generation on various tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity56.2 | 120 | |
| Language Modeling | Penn Treebank (PTB) (val) | Perplexity58.6 | 70 | |
| Unconditional Text Generation | EMNLP 2017 WMT News | Perplexity36.11 | 64 | |
| Conditional Text Generation | ROCStories (test) | UNION85.31 | 8 | |
| Unconditional Text Generation | EMNLP WMT News 2017 | Human Score0.512 | 8 | |
| Unconditional Text Generation | WMT News EMNLP2017 | LM Score5.14 | 8 | |
| Conditional Text Generation | ROCStories | Grammaticality Win Rate36.2 | 5 |