| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| R-Dataset zero-shot | AIDE | TSR95 | 9 | 1mo ago | |
| G-Dataset zero-shot | AIDE | TSR94.5 | 9 | 1mo ago | |
| Classic (Scene 3) | PlanORN | Success Rate (SR)0.66 | 6 | 1mo ago | |
| Classic (Scene 2) | OR | Success Rate97 | 6 | 1mo ago | |
| Classic (Scene 1) | OR | Success Rate (SR)100 | 6 | 1mo ago | |
| VLABench | π0 | Toy Success Rate54 | 4 | 23d ago | |
| LEMMA Single-agent | CLARA | Calls per Episode10.36 | 4 | 1mo ago | |
| Robotic Task Planning Capabilities | - | - | 0 | 1mo ago |