| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Coreference Resolution | Winogender (test) | Accuracy80.7 | 11 | |
| Coreference Resolution | Winogender (WG) (test) | Accuracy80.8 | 11 | |
| Bias Evaluation | WinoGender | EBS0.068 | 8 | |
| Commonsense Reasoning | WinoGender | Accuracy0.9681 | 8 | |
| Multiple-choice scoring | Winogender | Accuracy0.847 | 7 | |
| Co-reference resolution | WinoGender | Accuracy (All)77.5 | 4 | |
| Pronoun Coreference Resolution | Winogender (test) | Accuracy64.5 | 3 | |
| Coreference Resolution | Winogender | Accuracy62.9 | 3 |