| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Regression | Scenario IS2 | Size0 | 24 | |
| Classification | Scenario IS1 | Model Size0 | 24 | |
| Change point localization | Scenario 5 | Mismatch Proportion (K!=K)0.055 | 20 | |
| Change point localization | Scenario 3 | Error Proportion (K_hat != K)70.5 | 20 | |
| Change point localization | Scenario 1 T=300 | Prop. K_hat != K1 | 10 | |
| Change point localization | Scenario 1 T=150 | Error Proportion0 | 10 | |
| Constrained Motion Planning | Scenario 3 (two Franka Panda manipulators) 1.0 (test) | Success Rate100 | 8 | |
| Constrained motion planning | Scenario 2 (Two Franka Panda manipulators with closed-chain constraints) (test) | Success Rate100 | 8 | |
| Text-to-Image Generation | Scenario 4 | Similarity (95th Percentile)0.9262 | 8 | |
| Traffic Signal Control | Scenario VISSIM corridor 1 | ANP1,749.57 | 7 | |
| Simultaneous Exploration and Inspection | Scenario C | Finish Rate Avg98.7 | 7 | |
| Multi-robot motion planning | Scenario 3 Four-arm setup | Planning Time (Q1)0.075 | 6 | |
| Multi-robot motion planning | Scenario 2 Two-arm setup with obstacle | Time Q10.057 | 6 | |
| Multi-robot motion planning | Scenario 1 Two-arm setup | Planning Time (Q1)0.013 | 6 | |
| Autonomous Cyber Operations | Scenario 1 (Sce1) | Mean Reward-0.67 | 6 | |
| Multi-robot motion planning | Scenario Multi-arm complex setup 4 | Q Metric Q111 | 5 | |
| Causal Discovery | Scenario S3 n=100 synthetic (test) | FDR0 | 5 | |
| Causal Discovery | Scenario S3 (n=30) synthetic (test) | FDR0 | 5 | |
| Causal Discovery | Scenario S2 n=30 synthetic (test) | FDR10 | 5 | |
| Causal Discovery | Scenario S1 (n=100) synthetic (test) | FDR2 | 5 | |
| Depth map generation | Scenario (test) | Delta 1 Accuracy86.3 | 4 | |
| Violation Scenario Generation | Scenario S4 | Mean Violation6.78 | 3 | |
| Violation Scenario Generation | Scenario S3 | Mean Score4.11 | 3 | |
| Violation Scenario Generation | Scenario S2 | Mean Violation3.44 | 3 | |
| Violation Scenario Generation | Scenario S1 | Mean Score4.9 | 3 |