ELECTRAMed: a new pre-trained language representation model for biomedical NLP
About
The overwhelming amount of biomedical scientific texts calls for the development of effective language models able to tackle a wide range of biomedical natural language processing (NLP) tasks. The most recent dominant approaches are domain-specific models, initialized with general-domain textual data and then trained on a variety of scientific corpora. However, it has been observed that for specialized domains in which large corpora exist, training a model from scratch with just in-domain knowledge may yield better results. Moreover, the increasing focus on the compute costs for pre-training recently led to the design of more efficient architectures, such as ELECTRA. In this paper, we propose a pre-trained domain-specific language model, called ELECTRAMed, suited for the biomedical field. The novel approach inherits the learning framework of the general-domain ELECTRA architecture, as well as its computational advantages. Experiments performed on benchmark datasets for several biomedical NLP tasks support the usefulness of ELECTRAMed, which sets the novel state-of-the-art result on the BC5CDR corpus for named entity recognition, and provides the best outcome in 2 over the 5 runs of the 7th BioASQ-factoid Challange for the question answering task.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | BC5CDR (test) | Macro F1 (span-level)90.03 | 80 | |
| Named Entity Recognition | NCBI-disease (test) | Precision85.87 | 40 | |
| Named Entity Recognition | JNLPBA (test) | Macro F1 (span-level)73.65 | 23 | |
| Question Answering | BioASQ factoid 7b (test) | SAcc44.62 | 13 | |
| DDI extraction | DDIExtraction 2013 | F1 Score79.13 | 10 | |
| Relation Extraction | ChemProt | F1 Score72.94 | 10 | |
| Factoid Question Answering | BioASQ-factoid Challenge 7h (live runs) | Batch 1 Score1 | 5 | |
| Question Answering | BioASQ 7b-factoid Batch 4 | SACC61.18 | 4 | |
| Question Answering | BioASQ factoid Batch 2 7b | SACC46.4 | 4 | |
| Question Answering | BioASQ 7b-factoid Batch 5 | SACC24.57 | 4 |