BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
About
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | -- | 3381 | |
| Language Modeling | WikiText-2 (test) | PPL69.32 | 1949 | |
| Image Classification | ImageNet-1K | Top-1 Acc83.3 | 1239 | |
| Node Classification | Cora | Accuracy80.99 | 1215 | |
| Node Classification | Cora (test) | Mean Accuracy69.1 | 861 | |
| Commonsense Reasoning | PIQA | Accuracy66.7 | 751 | |
| Natural Language Inference | SNLI (test) | Accuracy91.6 | 690 | |
| Image Classification | Stanford Cars | Accuracy94.4 | 635 | |
| Language Modeling | WikiText-103 (test) | Perplexity107.3 | 579 | |
| Image Classification | EuroSAT | Accuracy98.8 | 569 |