| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Bias Measurement | StereoSet | Overall SS63.17 | 25 | |
| Stereotype Bias Evaluation | StereoSet Gender | LMS Score85.6 | 15 | |
| Out-of-Domain (OOD) Bias Evaluation | Stereoset | Accuracy67.2 | 14 | |
| Reasoning-intensive classification | StereoSet (test) | Macro F1 Score93 | 12 | |
| Stereotypical Bias Evaluation | StereoSet (dev) | Overall LMS Score84.172 | 12 | |
| Stereotype Bias Evaluation | StereoSet (test) | Gender SS77.12 | 8 | |
| Stereotype Detection | StereoSet n=237 | Accuracy93.4 | 5 | |
| Language Model Debiasing | StereoSet (test) | LMS Score0.8535 | 5 | |
| Bias Evaluation | StereoSet intrasentence | Gender SS67.34 | 3 | |
| Stereotype Bias Evaluation | StereoSet Overall | LMS77.6 | 2 | |
| Stereotype Bias Evaluation | StereoSet Race | LMS77 | 2 | |
| Stereotype Bias Evaluation | StereoSet Religion | LMS84 | 2 | |
| Stereotype Bias Evaluation | StereoSet Profession | LMS78.4 | 2 |