Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Publicly Available Clinical BERT Embeddings

About

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on three common clinical NLP tasks as compared to nonspecific embeddings. These domain-specific models are not as performant on two clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.

Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott• 2019

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceMedNLI (test)
Accuracy84.1
89
Survival PredictionTCGA-BRCA (5-fold cross-validation)
C-Index0.7136
54
Survival PredictionTCGA-KIRC (5-fold CV)
C-Index0.7664
46
Survival PredictionTCGA-LUAD (5-fold CV)
C-Index0.6308
46
Survival AnalysisUCEC TCGA (5-fold cross-validation)
C-Index0.7318
28
Named Entity RecognitionMIMIC-III (test)
Strict F181
26
Named Entity RecognitionUTP Two-site experiment (test)
Strict Micro F182.3
26
Named Entity RecognitionMTSamples (test)
Strict F183.8
26
Relation ExtractionMIMIC-III (test)
Strict F159.8
26
Relation ExtractionUTP Two-site experiment (test)
Strict Micro F10.413
26
Showing 10 of 60 rows

Other info

Follow for update