Simple Questions Generate Named Entity Recognition Datasets
About
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | CoNLL 2003 (test) | F1 Score65.4 | 539 | |
| Named Entity Recognition | BC5CDR (test) | Macro F1 (span-level)64.9 | 80 | |
| Named Entity Recognition | NCBI-disease (test) | Precision59 | 40 | |
| Named Entity Recognition | WNUT 2016 (test) | F1 Score36.5 | 26 | |
| Named Entity Recognition | Wikigold (test) | F1 Score41.3 | 10 |