Unsupervised Recurrent Neural Network Grammars

About

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, G\'abor Melis• 2019

Related benchmarks

Task	Dataset	Result
Language Modeling	PTB (test)	Perplexity78.3	543
Language Modeling	Penn Treebank (PTB) (test)	Perplexity85.9	130
Unsupervised Parsing	PTB (test)	F1 Score72.8	75
Grammar Induction	PTB English (test)	F1 Score67.7	29
Unsupervised Constituency Parsing	WSJ (test)	--	29
Unlabeled Parsing	Penn Treebank WSJ (test)	--	25
Unsupervised Constituency Parsing	WSJ10 (test)	UF1 Score51.1	24
Language Modeling	CTB (test)	Perplexity (Test)181.1	16
Syntactic Evaluation	Marvin and Linzen	Syntactic Evaluation Score76.1	15
Unsupervised Parsing	Penn Treebank WSJ (section 23 test)	F1 Score45.4	15

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord