SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

About

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao• 2019

Related benchmarks

Task	Dataset	Result
Natural Language Inference	SNLI (test)	Accuracy91.7	694
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)96.9	529
Natural Language Understanding	GLUE (test)	SST-2 Accuracy97.5	416
Question Classification	TREC	Accuracy68.17	262
Text Classification	AGNews	Accuracy86.12	119
Natural Language Inference	SciTail (test)	Accuracy95.2	86
Sentiment Classification	IMDB	Accuracy86.98	73
Natural Language Inference	SNLI (dev)	Accuracy92.6	71
Word Sense Disambiguation	WiC (dev)	Accuracy63.55	32
Natural Language Inference	ANLI (test)	Overall Score57.1	28

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord