Insights into Pre-training via Simpler Synthetic Tasks

About

Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of $67\%$ of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME. Third, we find that $39\%$ of the benefits can be attained by using merely the parameter statistics of synthetic pre-training. We release the source code at https://github.com/felixzli/synthetic_pretraining.

Yuhuai Wu, Felix Li, Percy Liang• 2022

Related benchmarks

Task	Dataset	Result
Machine Reading Comprehension	SQuAD 1.1 (test)	EM48.3	46
Semantic Parsing	mTOP (test)	--	17
Code Translation	Code Trans. (test)	Exact Match (EM)59.4	8
Pre-training Evaluation	Aggregated Downstream Tasks (test)	Average EM54.2	8
Retrosynthesis	USPTO Retrosynthesis 50K (test)	EM40.9	8
Semantic Parsing	WEBQSP (test)	EM72.2	8
Summarization	CNNDM 10K (test)	ROUGE-132.8	8

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord