| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU CF | TRUE (DAG-based predictor) | Perturbation Success Rate100 | 12 | 4d ago | |
| MATH | TRUE (DAG-based predictor) | Perturbed SR83.3 | 12 | 4d ago | |
| Chemistry Full (OOD) | GRADER | Success Rate83.9 | 9 | 4d ago | |
| Chemistry Jungle OOD | GRADER | Success Rate8,440 | 9 | 4d ago | |
| Chemistry Chain (OOD) | GRADER | Success Rate82.3 | 9 | 4d ago | |
| Chemistry Collider (OOD) | ICIL | Success Rate97 | 9 | 4d ago |