Rethinking Self-Attention: Towards Interpretability in Neural Parsing
About
Attention mechanisms have improved the performance of NLP tasks while allowing models to remain explainable. Self-attention is currently widely used, however interpretability is difficult due to the numerous attention distributions. Recent work has shown that model representations can benefit from label-specific information, while facilitating interpretation of predictions. We introduce the Label Attention Layer: a new form of self-attention where attention heads represent labels. We test our novel layer by running constituency and dependency parsing experiments and show our new model obtains new state-of-the-art results for both tasks on both the Penn Treebank (PTB) and Chinese Treebank. Additionally, our model requires fewer self-attention layers compared to existing work. Finally, we find that the Label Attention heads learn relations between syntactic categories and show pathways to analyze errors.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Constituent Parsing | PTB (test) | F196.38 | 127 | |
| Dependency Parsing | Chinese Treebank (CTB) (test) | UAS94.6 | 99 | |
| Dependency Parsing | Penn Treebank (PTB) (test) | LAS96.3 | 80 | |
| Constituent Parsing | CTB (test) | F1 Score92.64 | 45 | |
| Constituency Parsing | CTB 5.1 (test) | F1 Score92.64 | 25 | |
| Dependency Parsing | PTB | UAS97.4 | 24 | |
| Constituency Parsing | CTB 5.0 (test) | F1 Score92.64 | 19 | |
| Constituency Parsing | PTB (test) | Latency (s)40.8 | 3 |