| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU CF | TRUE (DAG-based predictor) | Perturbation Success Rate100 | 12 | 1mo ago | |
| MATH | TRUE (DAG-based predictor) | Perturbed SR83.3 | 12 | 1mo ago | |
| Experiment 1 (LOOCV) | R22.64 | 10 | 23d ago | ||
| Experiment 1 | R22.65 | 10 | 23d ago | ||
| Chemistry Full (OOD) | GRADER | Success Rate83.9 | 9 | 1mo ago | |
| Chemistry Jungle OOD | GRADER | Success Rate8,440 | 9 | 1mo ago | |
| Chemistry Chain (OOD) | GRADER | Success Rate82.3 | 9 | 1mo ago | |
| Chemistry Collider (OOD) | ICIL | Success Rate97 | 9 | 1mo ago |