BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
About
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | -- | 3381 | |
| Language Modeling | WikiText-2 (test) | PPL69.32 | 1541 | |
| Image Classification | ImageNet-1K | Top-1 Acc83.3 | 836 | |
| Node Classification | Cora (test) | Mean Accuracy69.1 | 687 | |
| Natural Language Inference | SNLI (test) | Accuracy91.6 | 681 | |
| Commonsense Reasoning | PIQA | Accuracy66.7 | 647 | |
| Named Entity Recognition | CoNLL 2003 (test) | F1 Score92.8 | 539 | |
| Language Modeling | WikiText-103 (test) | Perplexity107.3 | 524 | |
| Natural Language Understanding | GLUE (dev) | SST-2 (Acc)94.9 | 504 | |
| Image Classification | EuroSAT | Accuracy98.8 | 497 |