R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

About

Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard de Melo• 2021

Related benchmarks

Task	Dataset	Result
Unsupervised Parsing	PTB (test)	--	75
Unsupervised Constituency Parsing	Chinese Treebank (CTB) (test)	Unlabeled Sentence F1 (Mean)44.9	36
Natural Language Understanding	GLUE 1.0 (test)	SST-2 (Acc)89.33	28
Unsupervised Parsing	Penn Treebank WSJ (section 23 test)	F1 Score52.28	15
Unsupervised Parsing	Chinese Penn Treebank (CTB) 8.0 (test)	F163.94	12
Unsupervised Constituency Parsing	WSJ word-level gold trees (test)	F148.11	8
Dependency Tree Compatibility	WSJ Penn Treebank (test)	Compatibility (%) - All0.6929	7
Unsupervised Constituency Parsing	CTB word-level gold trees (test)	F1 Score44.85	7
Dependency Tree Compatibility	CTB (test)	All64.74	5
Unsupervised Constituency Parsing	WSJ word-piece level gold trees (test)	F1 Score52.28	3

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord