| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Detoxification | Jigsaw (test) | Perplexity (PPL)20.8 | 29 | |
| Visual Reasoning | Jigsaw | Accuracy88.6 | 25 | |
| Spatial Configuration | Jigsaw | Metric 299 | 12 | |
| Binary classification | jigsaw | ROC AUC0.97 | 11 | |
| Fairness Evaluation | Jigsaw | BiasAUC75.6 | 9 | |
| Binary Classification | Toxic Jigsaw | Competition Score0.987 | 7 | |
| Toxicity Detection | Jigsaw Perspective-based Negated Private (test) | Accuracy87 | 7 | |
| Fairness-aware Classification | Jigsaw | Training Time (min)30 | 7 | |
| Alignment Audit | Jigsaw Toxic Comment | Average Treatment Effect (ATE)0 | 5 | |
| Toxicity Classification | Jigsaw-ML | AUC98.4 | 2 | |
| Toxicity Classification | Jigsaw-BL | AUC97.1 | 2 | |
| Multi-label Toxic Content Classification | Jigsaw-ML | Attack Success Rate71.7 | 2 | |
| Binary Toxic Content Classification | Jigsaw-BL | Attack Success Rate99.27 | 2 |