| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Performance Estimation | JigSaw | MAE0.001 | 198 | |
| Visual Reasoning | Jigsaw | Accuracy88.6 | 40 | |
| Detoxification | Jigsaw (test) | Perplexity (PPL)20.8 | 29 | |
| Spatial Configuration | Jigsaw | Metric 299 | 12 | |
| Binary classification | jigsaw | ROC AUC0.97 | 11 | |
| Toxicity Classification | Jigsaw dataset | Rescue Rate44.2 | 9 | |
| Fairness Evaluation | Jigsaw | BiasAUC75.6 | 9 | |
| Classification | Jigsaw Text + Tabular | Accuracy95.94 | 8 | |
| Binary Classification | Toxic Jigsaw | Competition Score0.987 | 7 | |
| Toxicity Detection | Jigsaw Perspective-based Negated Private (test) | Accuracy87 | 7 | |
| Fairness-aware Classification | Jigsaw | Training Time (min)30 | 7 | |
| Visual Manipulation | Jigsaw Res5 | Accuracy4.3 | 6 | |
| Visual Manipulation | Jigsaw Res4 | Accuracy11.3 | 6 | |
| Visual Manipulation | Jigsaw Res3 | Accuracy21 | 6 | |
| Toxicity classification | Jigsaw (test) | Accuracy96 | 6 | |
| Visual puzzle solving | Jigsaw R1 (test) | Accuracy (2x1)61.9 | 6 | |
| Part-based Image Generation | Jigsaw | FID160.1 | 5 | |
| Alignment Audit | Jigsaw Toxic Comment | Average Treatment Effect (ATE)0 | 5 | |
| Toxicity Classification | Jigsaw-ML | AUC98.4 | 2 | |
| Toxicity Classification | Jigsaw-BL | AUC97.1 | 2 | |
| Multi-label Toxic Content Classification | Jigsaw-ML | Attack Success Rate71.7 | 2 | |
| Binary Toxic Content Classification | Jigsaw-BL | Attack Success Rate99.27 | 2 |