| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | Task 1 pooled across three runs 1.0 (test) | Ours Win Count24 | 5 | |
| Robot Manipulation | Task 1 1.0 (train) | Prob. of Improvement79 | 5 | |
| Deployment Performance | Task 1 | TTFT (s)1.239 | 4 | |
| Topology Optimization | Task 1 Boundary Conditions | Min Compliance21.15 | 3 | |
| Robotic Manipulation | Task 1 Scenario: Bring warm water and an apple | Average Success Rate (Task 1)80 | 3 | |
| Learning Efficiency | Task 1 Real robot evaluation (holdout set) | Optimality Gap0.317 | 3 |