Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

About

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS

Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, Yuqi Zhu, Zhi Jin• 2022

Related benchmarks

Task	Dataset	Result
Natural Language Inference	RTE	Accuracy73.16	590
Natural Language Understanding	GLUE (val)	--	201
Natural Language Inference	SNLI	Accuracy84.83	196
Natural Language Inference	MNLI (matched)	Accuracy79.16	110
Natural Language Inference	SICK	Accuracy58.18	85
Natural Language Inference	MNLI	--	80
Natural Language Inference	SciTail	Accuracy80.36	26

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord