Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Co-training an Unsupervised Constituency Parser with Weak Supervision

About

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.

Nickil Maveli, Shay B. Cohen• 2021

Related benchmarks

TaskDatasetResultRank
Unsupervised ParsingPTB (test)--
75
Unsupervised Constituency ParsingChinese Treebank (CTB) (test)
Unlabeled Sentence F1 (Mean)41.8
36
Unsupervised Constituency ParsingWSJ (test)
Max F166.8
29
Unsupervised Constituency ParsingWSJ10 (test)
UF1 Score74.2
24
Unsupervised Constituency ParsingKTB-40 all sentences (test)
Mean Evalb F139.2
10
Unsupervised Constituency ParsingKTB length <= 10 (test)
Mean Evalb F156.7
10
Showing 6 of 6 rows

Other info

Code

Follow for update