| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Physical Reasoning | PHYRE-1B cross-template (test) | AUCCESS42.2 | 7 | |
| Physical Reasoning | PHYRE-1B within-template (test) | AUCCESS86.2 | 7 | |
| Physical Reasoning | PHYRE Cross-template 1.0 | Success Rate50.86 | 6 | |
| Physical Reasoning | PHYRE Within-template 1.0 | Success Rate (AUCCESS)85.49 | 6 | |
| Physical Reasoning | PHYRE-2B cross-template (test) | AUCCESS24.3 | 5 | |
| Physical Reasoning | PHYRE cross-template B | AUCCESS56.31 | 5 | |
| Physical Reasoning | PHYRE within-template B | AUCCESS85.49 | 5 | |
| Trajectory Prediction | PHYRE-C (test) | Avg Prediction Error9.22 | 4 | |
| Trajectory Prediction | PHYRE-W (t ∈ [T_train, 2 × T_train]) | Average Prediction Error11.1 | 4 | |
| Trajectory Prediction | PHYRE-W (train) | Avg Prediction Error1.31 | 4 | |
| Planning | PHYRE cross-task generalization B-tier | AUCCESS42.2 | 3 | |
| Planning | PHYRE within-task generalization B-tier | AUCCESS85.2 | 3 |