GECToR -- Grammatical Error Correction: Tag, Not Rewrite
About
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an $F_{0.5}$ of 65.3/66.5 on CoNLL-2014 (test) and $F_{0.5}$ of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.
Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Grammatical Error Correction | CoNLL 2014 (test) | F0.5 Score66.5 | 207 | |
| Grammatical Error Correction | BEA shared task 2019 (test) | F0.5 Score73.7 | 139 | |
| Grammatical Error Correction | MuCGEC (test) | Precision46.72 | 34 | |
| Grammatical Error Correction | BEA 2019 (dev) | F0.5 Score55.62 | 19 | |
| Grammatical Error Correction | FCGEC (test) | Precision46.11 | 17 | |
| Grammatical Error Correction | BEA 2019 (test) | F0.572.4 | 12 | |
| Morph Resolution | LiveAMR (Test2) | Acc70.2 | 9 | |
| Morph Resolution | LiveAMR (test1) | Accuracy65.1 | 9 | |
| Grammatical Error Correction | FCGEC | EM15.66 | 9 |
Showing 9 of 9 rows