How to Fine-Tune BERT for Text Classification?
About
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Understanding | GLUE (dev) | SST-2 (Acc)93.2 | 504 | |
| Natural Language Understanding | GLUE (test) | SST-2 Accuracy93.5 | 416 | |
| Question Answering | SQuAD v1.1 (dev) | F1 Score88.5 | 375 | |
| Text Classification | AG News (test) | -- | 210 | |
| Question Answering | SQuAD v2.0 (dev) | F176.3 | 158 | |
| Sentiment Classification | IMDB (test) | Error Rate4.21 | 144 | |
| Text Classification | Yahoo! Answers (test) | -- | 133 | |
| Text Classification | TREC (test) | -- | 113 | |
| Machine Reading Comprehension | RACE (test) | RACE Accuracy (Medium)71.7 | 111 | |
| Text Classification | DBPedia (test) | Test Error Rate0.0061 | 40 |
Showing 10 of 27 rows