An Analysis of Simple Data Augmentation for Named Entity Recognition

About

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

Xiang Dai, Heike Adel• 2020

Related benchmarks

Task	Dataset	Result
Named Entity Recognition	CoNLL 2003 (test)	--	556
Named Entity Recognition	CoNLL 03	--	140
Named Entity Recognition	OntoNotes	F1-score62.67	121
Complex Named Entity Recognition	MultiCoNER (test)	Score (Bn)39.9	76
Named Entity Recognition	WNUT 2017 (test)	F1 Score52.29	63
Named Entity Recognition	MultiCoNER	F1 Score0.5467	48
Named Entity Recognition	bc2gm	Entity F160.46	48
Named Entity Recognition	NCBI	F1 Score78.97	29
Named Entity Recognition	Twitter NER	F1 Score73.69	23
Named Entity Recognition	CoNLL	F1 Score0.8508	10

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord