SpanBERT: Improving Pre-training by Representing and Predicting Spans
About
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6\% F1), strong performance on the TACRED relation extraction benchmark, and even show gains on GLUE.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Understanding | GLUE (test) | SST-2 Accuracy94.8 | 416 | |
| Question Answering | SQuAD v1.1 (dev) | F1 Score94.6 | 375 | |
| Question Answering | SQuAD v1.1 (test) | F1 Score94.6 | 260 | |
| Relation Extraction | TACRED (test) | F1 Score70.8 | 194 | |
| Question Answering | SQuAD v2.0 (dev) | F188.7 | 158 | |
| Coreference Resolution | CoNLL English 2012 (test) | MUC F1 Score85.5 | 114 | |
| Question Answering | NewsQA (dev) | F1 Score29.5 | 101 | |
| Relation Extraction | TACRED | Micro F170.8 | 97 | |
| Question Answering | SQuAD (dev) | F155.8 | 74 | |
| Question Answering | Natural Question (NQ) (dev) | F136 | 72 |