| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fair Generation | WinoBias Race-Pro (extended) | Deviation Ratio0.08 | 20 | |
| Fair Generation | WinoBias Race (standard) | Deviation Ratio0.04 | 20 | |
| Fair Generation | WinoBias Gender-Pro extended | Deviation Ratio0.07 | 20 | |
| Fair Generation | WinoBias Gender (standard) | Deviation Ratio100 | 20 | |
| Out-of-Domain (OOD) Bias Evaluation | Winobias | Accuracy0.507 | 14 | |
| Stereotype Fairness Identification | WinoBias cloze-style (test) | P_stereo43.18 | 14 | |
| Influence Estimation | WinoBias (test) | Spearman Correlation0.854 | 14 | |
| Hallucination Detection | Winobias (test) | AUROC54.43 | 10 | |
| Gender Bias in Coreference Resolution | WinoBias | P(Stereo)49.49 | 7 | |
| Coreference Resolution | WinoBias syntax-type-2 | ECE0.154 | 6 | |
| Text-to-Image Debiasing | Winobias | Librarian Score86 | 6 | |
| Attribute Presence Measurement | WinoBias (Prof) | Attribute Presence94.1 | 4 | |
| Gender Bias Mitigation | WinoBias 40 occupations | Bias0.146 | 4 | |
| Gender-fair rewriting | WinoBias+ (test) | Tokenised WER0.04 | 4 | |
| Image-to-Image Editing | WinoBias adapted for I2I editing (test) | Edit Success Rate93.9 | 3 | |
| Fair Image Generation (Race) | Extended Winobias Race+ (test) | Analyst77 | 3 | |
| Fair Image Generation (Race) | Winobias original (test) | Analyst82 | 3 | |
| Fair Image Generation (Gender) | Extended Winobias Gender+ (test) | Analyst Performance54 | 3 | |
| Fair Image Generation (Gender) | Winobias original (test) | Analyst70 | 3 | |
| Coreference Resolution | WinoBias (test) | Accuracy85.1 | 2 |