An Analysis of Simple Data Augmentation for Named Entity Recognition
About
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.
Xiang Dai, Heike Adel• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | CoNLL 2003 (test) | -- | 539 | |
| Named Entity Recognition | CoNLL 03 | F1 (Entity)83.74 | 102 | |
| Named Entity Recognition | OntoNotes | F1-score62.67 | 91 | |
| Complex Named Entity Recognition | MultiCoNER (test) | Score (Bn)39.9 | 76 | |
| Named Entity Recognition | WNUT 2017 (test) | F1 Score52.29 | 63 | |
| Named Entity Recognition | MultiCoNER | F1 Score0.5467 | 48 | |
| Named Entity Recognition | NCBI | F1 Score78.97 | 26 | |
| Named Entity Recognition | bc2gm | Entity F160.46 | 21 | |
| Named Entity Recognition | Twitter NER | F1 Score73.69 | 14 | |
| Named Entity Recognition | CoNLL | F1 Score0.8508 | 10 |
Showing 10 of 16 rows