| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BBQ | C2PO | Accuracy99.3 | 99 | 3d ago | |
| BBQ averaged across gender, nationality, and religion domains | Self-Debiasing | Accuracy (Ambiguous)87.73 | 16 | 3d ago | |
| SOCT | Llama 3.1 8B - LFT w. SH-N (baseline 3) | DR (Female Stereotype)0.054 | 15 | 3d ago | |
| Crows-pairs | Qwen 3 0.6B - LFT w. SH (baseline 2) | Pct Stereotype51.25 | 15 | 3d ago | |
| Honest | Llama 3.1 8B - LFT w. SH-Dgender (BaseCDA) | Honest Score11.7 | 15 | 3d ago | |
| Reddit Bias | Llama 3.1 8B - Pretrained model (baseline 1) | t-value-4.7523 | 15 | 3d ago | |
| Male-biased prompts | Manually curated | Male Bias (Base)0.53 | 14 | 3d ago | |
| CrowS-Pairs | CDA | CS Score50.01 | 13 | 3d ago | |
| HolisticBias | PaCE | GN Score66.2 | 10 | 3d ago | |
| Crow-S | Score57.25 | 9 | 3d ago | ||
| WinoGender | EBS0.068 | 8 | 3d ago | ||
| KoBBQ | Exaone-3.5-7.8B-inst | Ambiguous Context Score85.91 | 5 | 3d ago | |
| StereoSet intrasentence | CodeGen-Multi-16B | Gender SS67.34 | 3 | 3d ago | |
| BBQ Disambiguated | LLaMA-3.1 8B | Bias Score Before90.07 | 1 | 3d ago |