Better Fine-Tuning by Reducing Representational Collapse

About

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta• 2020

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)97.1	529
Natural Language Understanding	GLUE (val)	--	201
Natural Language Inference	XNLI 1.0 (test)	Accuracy (en)89.6	40
Summarization	Gigaword (test)	ROUGE-220.7	38
Abstractive Summarization	CNN/DailyMail	ROUGE-144.38	25
Binary Classification	AdvGLUE (test)	QNLI Accuracy0.475	17
Summarization	CNN-DM (test)	ROUGE-137.28	11
Summarization	Reddit TIFU Long (test)	ROUGE-130.31	4
Summarization	RedditTIFU long	R130.31	3

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord