Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Biomedical Named Entity Recognition at Scale

About

Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then fed to downstream tasks like assertion status detection, entity resolution, relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char deep learning architecture on top of Apache Spark, we present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings like BERT. This includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6% gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely available within a production-grade code base as part of the open-source Spark NLP library; can scale up for training and inference in any Spark cluster; has GPU support and libraries for popular programming languages such as Python, R, Scala and Java; and can be extended to support other human languages with no code changes.

Veysel Kocaman, David Talby• 2020

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionBC5CDR (test)
Macro F1 (span-level)89.73
80
Named Entity RecognitionBC5CDR
F1 Score89.73
59
Named Entity RecognitionNCBI-disease (test)--
40
Named Entity RecognitionNCBI-disease
F1 Score89.13
29
Named Entity RecognitionJNLPBA (test)
Macro F1 (span-level)81.29
23
Named Entity RecognitionAnatEM
F1 Score89.13
21
Named Entity RecognitionBC4CHEMD
F1 Score93.72
14
Named Entity RecognitionNBCI-Disease preprocessed (test)
Micro F1 (Excl. O)89.13
4
Named Entity RecognitionBC5CDR preprocessed (test)
Micro F1 (excl O)89.73
4
Named Entity RecognitionBC4CHEMD preprocessed (test)
Micro F1 (excl O)93.72
4
Showing 10 of 18 rows

Other info

Follow for update