| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Bias Evaluation | Task 2 Persona F | BAD Score0.3 | 25 | |
| Bias Evaluation | Task 2 Persona D | BAD Score0.007 | 25 | |
| Bias Evaluation | Task 2 Persona C | BAD Score-0.18 | 25 | |
| Bias Evaluation | Task 2 Persona B | BAD Score-0.007 | 25 | |
| Bias Evaluation | Task 2 Persona A | BAD Score-0.006 | 25 | |
| Disease Prediction | Task 2 Type 2 Diabetes (test) | AUROC0.83 | 10 | |
| Morphologic and molecular classification | Task 2 | Accuracy74.8 | 8 | |
| Robot Manipulation | Task 2 Concept-rich 1.0 (train) | Probability of Improvement0.93 | 5 | |
| Question Answering | Task 2 Cross-domain | Answer Accuracy59.4 | 4 | |
| Question Answering | Task 2 Single-domain | Answer Accuracy79.98 | 4 |