Discontinuous Grammar as a Foreign Language

About

In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.

Daniel Fern\'andez-Gonz\'alez, Carlos G\'omez-Rodr\'iguez• 2021

Related benchmarks

Task	Dataset	Result
Constituent Parsing	PTB (test)	F195.84	155
Discontinuous constituent parsing	TIGER (test)	F1 Score88.53	16
Discontinuous constituent parsing	NEGRA (test)	F1 Score89.08	16
Discontinuous constituent parsing	DPTB (test)	F1 Score95.48	15

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord