Debiasing Pre-trained Contextualised Embeddings

About

In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings. Our proposed method can be applied to any pre-trained contextualised embedding model, without requiring to retrain those models. Using gender bias as an illustrative example, we then conduct a systematic study using several state-of-the-art (SoTA) contextualised representations on multiple benchmark datasets to evaluate the level of biases encoded in different contextualised embeddings before and after debiasing using the proposed method. We find that applying token-level debiasing for all tokens and across all layers of a contextualised embedding model produces the best performance. Interestingly, we observe that there is a trade-off between creating an accurate vs. unbiased contextualised embedding model, and different contextualised embedding models respond differently to this trade-off.

Masahiro Kaneko, Danushka Bollegala• 2021

Related benchmarks

Task	Dataset	Result
Question Answering	BBQ Gender	Accuracy72	36
Stereotype Bias Evaluation	StereoSet Gender	LMS Score84.42	24
Question Answering	BBQ Race	Accuracy68.2	18
Question Answering	BBQ Nationality	Accuracy69.6	18
Stereotype Bias Evaluation	StereoSet Overall	LMS58.04	8
Bias Evaluation	CrowS-Pairs	CP-S Score47.71	6
Question Answering	BBQ Overall Llama-3	Accuracy64.2	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord