Grammar as a Foreign Language

About

Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton• 2014

Related benchmarks

Task	Dataset	Result
Constituent Parsing	PTB (test)	F190.5	155
Phrase-structure parsing	PTB (§23)	F1 Score92.1	56
Constituency Parsing	Penn Treebank WSJ (section 23 test)	F1 Score92.8	55
Constituency Parsing	WSJ Penn Treebank (test)	F1 Score92.8	27
English constituency parsing	Wall Street Journal (WSJ) (Section 23)	F1 Score92.1	12
Constituency Parsing	Penn Treebank WSJ section 22 (dev)	F1 Score93.5	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord