Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
About
We introduce a new benchmark, WinoBias, for coreference resolution focused on gender bias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing coreference benchmark datasets. Our dataset and code are available at http://winobias.org.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Gender Bias Mitigation | Multilingual CrowS-Pairs gender-sensitive attributes | Bias Score (DE)1.37 | 18 | |
| Racial Bias Evaluation | Multilingual CrowS-Pairs racial bias | Bias Score (DE)15.56 | 18 | |
| Religious Bias Evaluation | Multilingual CrowS-Pairs (test) | Bias Score (DE)16.67 | 18 |