Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

About

We introduce deep inside-outside recursive autoencoders (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence and uses inside-outside dynamic programming to consider all possible binary trees over the sentence. At test time the CKY algorithm extracts the highest scoring parse. DIORA achieves a new state-of-the-art F1 in unsupervised binary constituency parsing (unlabeled) in two benchmark datasets, WSJ and MultiNLI.

Andrew Drozdov, Pat Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum• 2019

Related benchmarks

Task	Dataset	Result
Unsupervised Parsing	PTB (test)	F1 Score62.3	75
Unsupervised Constituency Parsing	SUSANNE (test)	F1 Score44	32
Grammar Induction	PTB English (test)	F1 Score55.7	29
Unsupervised Constituency Parsing	WSJ (test)	Max F156.8	29
Unlabeled Parsing	Penn Treebank WSJ (test)	--	25
Unsupervised Constituency Parsing	Penn TreeBank English (test)	Mean S-F155.7	16
Unsupervised Parsing	Penn Treebank WSJ (section 23 test)	F1 Score51.4	15
Grammar Induction	PTB binarized (test)	F1 Score49.6	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord