| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Reasoning and Coding | Out-of-Distribution (ARC-c, GPQA, MMLU-Pro, LiveCodeBench) | ARC-c (pass@1)96.6 | 16 | |
| Debiasing Effectiveness | Out-of-Distribution (OOD) Split | Mean Ratio29.31 | 16 | |
| Reasoning Generalization | Out-of-Distribution Avg | Avg Score (OOD)59.7 | 15 | |
| model jamming with fixed melodies | Out of Distribution (test) | Harmony Ratio78.4 | 5 | |
| Metasurface inverse design | Out-of-Distribution (test) | SG92 | 2 |