Coreference Resolution through a seq2seq Transition-Based System
About
Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly. We implement the coreference system as a transition system and use multilingual T5 as an underlying language model. We obtain state-of-the-art accuracy on the CoNLL-2012 datasets with 83.3 F1-score for English (a 2.3 higher F1-score than previous work (Dobrovolskii, 2021)) using only CoNLL data for training, 68.5 F1-score for Arabic (+4.1 higher than previous work) and 74.3 F1-score for Chinese (+5.3). In addition we use the SemEval-2010 data sets for experiments in the zero-shot setting, a few-shot setting, and supervised setting using all available training data. We get substantially higher zero-shot F1-scores for 3 out of 4 languages than previous approaches and significantly exceed previous supervised state-of-the-art results for all five tested languages.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Coreference Resolution | CoNLL English 2012 (test) | MUC F1 Score87.8 | 114 | |
| Coreference Resolution | OntoNotes | MUC87.8 | 23 | |
| Coreference Resolution | English OntoNotes 5.0 (test) | MUC Precision87.4 | 18 | |
| Coreference Resolution | CoNLL Chinese 2012 (test) | Average F1 Score74.3 | 11 | |
| Coreference Resolution | SemEval Spanish 2010 (test) | Avg F183.9 | 8 | |
| Coreference Resolution | SemEval Catalan 2010 (test) | Avg F1 Score83.5 | 7 | |
| Coreference Resolution | SemEval Dutch 2010 (test) | Average F166.6 | 7 | |
| Coreference Resolution | SemEval German 2010 (test) | Avg F186.4 | 7 | |
| Coreference Resolution | SemEval Italian 2010 (test) | Avg F165.9 | 6 | |
| Coreference Resolution | CoNLL Arabic 2012 (test) | MUC Precision71 | 3 |