Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Measuring and Reducing Gendered Correlations in Pre-trained Models

About

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.

Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, Slav Petrov• 2020

Related benchmarks

TaskDatasetResultRank
Counterfactual Input EvaluationCrowS-Pairs
SS55.35
33
Stereotype Bias EvaluationStereoSet Gender
LMS Score85.42
15
Gender bias evaluationSEAT
SEAT 60.912
13
Stereotypical Bias EvaluationStereoSet (dev)
Overall LMS Score83.811
12
Bias EvaluationCrow-S
Score55.977
9
Intrinsic Bias EvaluationSTS-B
StereoSet Score53.2
3
Intrinsic Bias EvaluationNLI-bias
StereoSet Score53.2
3
Intrinsic Bias EvaluationBiasBios
StereoSet Score53.2
3
Showing 8 of 8 rows

Other info

Follow for update