Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation
About
Recently CKY-based models show great potential in unsupervised grammar induction thanks to their human-like encoding paradigm, which runs recursively and hierarchically, but requires $O(n^3)$ time-complexity. Recursive Transformer based on Differentiable Trees (R2D2) makes it possible to scale to large language model pre-training even with complex tree encoder by introducing a heuristic pruning method. However, the rule-based pruning approach suffers from local optimum and slow inference issues. In this paper, we fix those issues in a unified method. We propose to use a top-down parser as a model-based pruning method, which also enables parallel encoding during inference. Typically, our parser casts parsing as a split point scoring task, which first scores all split points for a given sentence, and then recursively splits a span into two by picking a split point with the highest score in the current span. The reverse order of the splits is considered as the order of pruning in R2D2 encoder. Beside the bi-directional language model loss, we also optimize the parser by minimizing the KL distance between tree probabilities from parser and R2D2. Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and achieves competitive results in downstream classification tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unsupervised Parsing | PTB (test) | F1 Score57.22 | 75 | |
| Unsupervised Constituency Parsing | Chinese Treebank (CTB) (test) | Unlabeled Sentence F1 (Mean)45.3 | 36 | |
| Grammar Induction | PTB English (test) | F1 Score57.2 | 29 | |
| Natural Language Understanding | GLUE 1.0 (test) | CoLA (MCC)40.11 | 25 | |
| Unsupervised Parsing | Penn Treebank WSJ (section 23 test) | F1 Score57.22 | 15 | |
| Unsupervised Parsing | Chinese Penn Treebank (CTB) 8.0 (test) | F167.79 | 12 |