Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Grammar as a Foreign Language

About

Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton• 2014

Related benchmarks

TaskDatasetResultRank
Phrase-structure parsingPTB (§23)
F1 Score92.1
56
Constituency ParsingPenn Treebank WSJ (section 23 test)
F1 Score92.8
55
Constituency ParsingWSJ Penn Treebank (test)
F1 Score92.8
27
English constituency parsingWall Street Journal (WSJ) (Section 23)
F1 Score92.1
12
Constituency ParsingPenn Treebank WSJ section 22 (dev)
F1 Score93.5
9
Showing 5 of 5 rows

Other info

Follow for update