| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| R-Dataset zero-shot | AIDE | TSR95 | 9 | 3mo ago | |
| G-Dataset zero-shot | AIDE | TSR94.5 | 9 | 3mo ago | |
| Classic (Scene 3) | PlanORN | Success Rate (SR)0.66 | 6 | 3mo ago | |
| Classic (Scene 2) | OR | Success Rate97 | 6 | 3mo ago | |
| Classic (Scene 1) | OR | Success Rate (SR)100 | 6 | 3mo ago | |
| Dual-arm Kitchen Average over Tasks 1-5 (test) | Success Rate (SR)92 | 5 | 1d ago | ||
| Dual-arm Kitchen Task 5 Plan (test) | Success Rate (SR)100 | 5 | 1d ago | ||
| Dual-arm Kitchen Task 4 Plan (test) | Success Rate (SR)90 | 5 | 1d ago | ||
| Dual-arm Kitchen Task 3 Plan (test) | Success Rate (SR)100 | 5 | 1d ago | ||
| Dual-arm Kitchen Task 2 Plan (test) | Success Rate (SR)70 | 5 | 1d ago | ||
| Dual-arm Kitchen Task 1 Plan (test) | Success Rate (SR)100 | 5 | 1d ago | ||
| VLABench | π0 | Toy Success Rate54 | 4 | 2mo ago | |
| LEMMA Single-agent | CLARA | Calls per Episode10.36 | 4 | 2mo ago | |
| Robotic Task Planning Capabilities | - | - | 0 | 3mo ago |