Stack-propagation: Improved Representation Learning for Syntax
About
Traditional syntax models typically leverage part-of-speech (POS) information by constructing features from hand-tuned templates. We demonstrate that a better approach is to utilize POS tags as a regularizer of learned representations. We propose a simple method for learning a stacked pipeline of models which we call "stack-propagation". We apply this to dependency parsing and tagging, where we use the hidden layer of the tagger network as a representation of the input tokens for the parser. At test time, our parser does not require predicted POS tags. On 19 languages from the Universal Dependencies, our method is 1.3% (absolute) more accurate than a state-of-the-art graph-based approach and 2.7% more accurate than the most comparable greedy model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Part-of-Speech Tagging | UD Average 1.2 (test) | Accuracy95.4 | 22 | |
| Dependency Parsing | Penn Treebank (PTB) Section 23 v2.2 (test) | UAS93.43 | 17 | |
| Dependency Parsing | Universal Dependencies 1.2 (test) | UAS (de)74.2 | 11 |