Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction
About
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Relation Extraction | SCIERC (test) | F1 Score39.3 | 23 | |
| Entity recognition | SCIERC (test) | F1 Score64.2 | 20 | |
| Keyphrase Extraction | SemEval Task 10 ScienceIE 2017 (test) | F1 Score46 | 15 | |
| Coreference Resolution | SCIERC (test) | Precision52 | 7 | |
| Entity recognition | SCIERC (dev) | Precision70 | 6 | |
| Coreference Resolution | STM corpus five-fold cross validation (test) | MUC Precision60.3 | 6 | |
| Relation Extraction | SCIERC (dev) | Precision45.4 | 4 | |
| TDM Triple Extraction | NLP-TDMS excluding papers with 'Unknown' annotation (test) | Macro Precision24.9 | 4 | |
| Task + Dataset + Metric Extraction | NLP-TDMS (test) | Macro Precision0.181 | 4 | |
| Span Identification | SemEval ScienceIE Task 10 2017 (test) | F1 Score58.6 | 3 |