Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning

About

Masked language models (MLMs) are pre-trained with a denoising objective that is in a mismatch with the objective of downstream fine-tuning. We propose pragmatic masking and surrogate fine-tuning as two complementing strategies that exploit social cues to drive pre-trained representations toward a broad set of concepts useful for a wide class of social meaning tasks. We test our models on $15$ different Twitter datasets for social meaning detection. Our methods achieve $2.34\%$ $F_1$ over a competitive baseline, while outperforming domain-specific language models pre-trained on large datasets. Our methods also excel in few-shot learning: with only $5\%$ of training data (severely few-shot), our methods enable an impressive $68.54\%$ average $F_1$. The methods are also language agnostic, as we show in a zero-shot setting involving six datasets from three different languages.

Chiyu Zhang, Muhammad Abdul-Mageed• 2021

Related benchmarks

Task	Dataset	Result
Semantic Textual Similarity	STS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test)	STS12 Score50.07	412
Transfer Learning	SentEval Transfer tasks (test)	MR86.79	23
Emotion Detection	EmoMoham v1 (test)	Macro F1 Score81.25	14
Out-of-domain performance average	Average Out-of-Domain	Macro F175.25	14
Crisis Classification	CrisisOltea v1 (test)	Macro F195.89	14
Twitter dataset performance average	Average In-Domain	Macro F177.71	14
Hate Speech Detection	HateWas v1 (test)	Macro F157.05	14

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord