Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

About

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust, practical, publicly available models. This paper describes scispaCy, a new tool for practical biomedical/scientific text processing, which heavily leverages the spaCy library. We detail the performance of two packages of models released in scispaCy and demonstrate their robustness on several tasks and datasets. Models and code are available at https://allenai.github.io/scispacy/

Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar• 2019

Related benchmarks

TaskDatasetResultRank
Entity LinkingMM-ST21PV english (test)
Recall@153.8
11
Entity LinkingQUAERO-MEDLINE french (test)
Recall@140.5
11
Entity LinkingQUAERO-EMEA french (test)
Recall@137.1
11
Entity LinkingSPACCC spanish (test)
Recall@113.2
11
Named Entity RecognitionNBCI-Disease preprocessed (test)
Micro F1 (Excl. O)81.65
4
Named Entity RecognitionBC5CDR preprocessed (test)
Micro F1 (excl O)83.92
4
Named Entity RecognitionBC4CHEMD preprocessed (test)
Micro F1 (excl O)84.55
4
Named Entity RecognitionLinnaeus preprocessed (test)
Micro-F1 (excl. O)81.74
4
Named Entity RecognitionSpecies800 preprocessed (test)
Micro-F1 (excl. O)74.06
4
Named Entity RecognitionJNLPBA preprocessed (test)
Micro F1 (Excl O)73.21
4
Showing 10 of 12 rows

Other info

Follow for update