SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
About
Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | SNLI (test) | Accuracy91.7 | 681 | |
| Natural Language Understanding | GLUE (dev) | SST-2 (Acc)96.9 | 504 | |
| Natural Language Understanding | GLUE (test) | SST-2 Accuracy97.5 | 416 | |
| Question Classification | TREC | Accuracy68.17 | 205 | |
| Text Classification | AGNews | Accuracy86.12 | 119 | |
| Natural Language Inference | SciTail (test) | Accuracy95.2 | 86 | |
| Natural Language Inference | SNLI (dev) | Accuracy92.6 | 71 | |
| Sentiment Classification | IMDB | Accuracy86.98 | 41 | |
| Word Sense Disambiguation | WiC (dev) | Accuracy63.55 | 32 | |
| Natural Language Inference | ANLI (test) | Overall Score57.1 | 28 |