Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

About

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary training on BERT (Devlin et al., 2018), we attain a GLUE score of 81.8---the state of the art (as of 02/24/2019) and a 1.4 point improvement over BERT. We also observe reduced variance across random restarts in this setting. Our approach yields similar improvements when applied to ELMo (Peters et al., 2018a) and Radford et al. (2018)'s model. In addition, the benefits of supplementary training are particularly pronounced in data-constrained regimes, as we show in experiments with artificially limited training data.

Jason Phang, Thibault F\'evry, Samuel R. Bowman• 2018

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)93.2	529
Natural Language Understanding	GLUE (test)	SST-2 Accuracy94.3	416
Text Classification	SST-2 (test)	Accuracy85.5	185
Domain Generalization	DomainBed (out-of-domain)	VLCS Accuracy77.7	55
Natural Language Understanding	GLUE 1.0 (test)	SST-2 (Acc)93.1	28
Sentence Classification	MPQA (test)	Accuracy76.6	22
Sentence Classification	Subj full (test)	Accuracy83.2	9
Sentence Classification	MR full (test)	Accuracy81.9	9
Sentence Classification	CR full (test)	Accuracy84.7	9
Sentence Classification	IMDB full (test)	Accuracy86.9	9

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord