| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BBQ | C2PO | Accuracy99.3 | 171 | 6d ago | |
| Task 2 Persona F | BAD Score0.3 | 25 | 1mo ago | ||
| Task 2 Persona E | BAD Score0.017 | 25 | 1mo ago | ||
| Task 2 Persona D | BAD Score0.007 | 25 | 1mo ago | ||
| Task 2 Persona C | BAD Score-0.18 | 25 | 1mo ago | ||
| Task 2 Persona B | BAD Score-0.007 | 25 | 1mo ago | ||
| Task 2 Persona A | BAD Score-0.006 | 25 | 1mo ago | ||
| Race & SES | iPASwo | Mean Improvement17 | 18 | 16d ago | |
| KoBBQ | Ambiguous Context Score98.3 | 17 | 2mo ago | ||
| BBQ averaged across gender, nationality, and religion domains | Self-Debiasing | Accuracy (Ambiguous)87.73 | 16 | 3mo ago | |
| SOCT | Llama 3.1 8B - LFT w. SH-N (baseline 3) | DR (Female Stereotype)0.054 | 15 | 3mo ago | |
| Crows-pairs | Qwen 3 0.6B - LFT w. SH (baseline 2) | Pct Stereotype51.25 | 15 | 3mo ago | |
| Honest | Llama 3.1 8B - LFT w. SH-Dgender (BaseCDA) | Honest Score11.7 | 15 | 3mo ago | |
| Reddit Bias | Llama 3.1 8B - Pretrained model (baseline 1) | t-value-4.7523 | 15 | 3mo ago | |
| Male-biased prompts | Manually curated | Male Bias (Base)0.53 | 14 | 3mo ago | |
| CrowS-Pairs | CDA | CS Score50.01 | 13 | 2mo ago | |
| HolisticBias | PaCE | GN Score66.2 | 10 | 2mo ago | |
| SexualOrientation | iPASwo | Mean Improvement0.1 | 9 | 16d ago | |
| Religion | iPASwo | Mean Improvement0.11 | 9 | 16d ago | |
| Race & Gender | iPASwo | Mean Improvement20 | 9 | 16d ago | |
| RaceEthnicity | iPASa | Mean Improvement15 | 9 | 16d ago | |
| PhysicalAppearance | iPASa | Mean Improvement0.07 | 9 | 16d ago | |
| Nationality | iPASwo | Mean Improvement0.12 | 9 | 16d ago | |
| GenderIdentity | iPASwo | Mean Improvement0.12 | 9 | 16d ago | |
| DisabilityStatus | iPASwo | Mean Improvement0.17 | 9 | 16d ago |