| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CALVIN ABC-D | FALCON | Task 1 Success Rate98.4 | 34 | 1mo ago | |
| CALVIN ABC→D Zero-shot | FOFPred | Task 1 Success Rate98.8 | 16 | 1mo ago | |
| Long-horizon tasks (test) | MIND-V | PFC Score0.445 | 6 | 1mo ago | |
| RLBench | CoT-VLA | Pick Cup Success Rate86 | 5 | 1mo ago | |
| Real-World Execution | Action-Sketcher | Tidy Table Success Rate52 | 4 | 1mo ago | |
| RoboTwin Simulation 2.0 | Action-Sketcher | Stack Blocks34.5 | 4 | 1mo ago | |
| AIRBOT Play real-world | CLOVER | Sub-task 1 Success Rate93.3 | 4 | 1mo ago | |
| SayCan Kitchen1 | SayCan w/ Gato | Planning Success Rate87 | 4 | 1mo ago | |
| SortToCabinet Synthesized | Success Rate28 | 3 | 6d ago | ||
| PackStationery Synthesized | Success Rate100 | 3 | 6d ago | ||
| PackBreads Synthesized | Success Rate54 | 3 | 6d ago | ||
| AutoCheckout Synthesized | Success Rate54 | 3 | 6d ago | ||
| Real-world Unseen Lighting | PALM | Success Rate (Step 1)80 | 3 | 1mo ago | |
| Real-world (Visual Distraction) | PALM | Success Rate (Step 1)85 | 3 | 1mo ago | |
| Real-world Random Localization | PALM | Success Rate (Step 1)70 | 3 | 1mo ago | |
| SayCan Kitchen2 | SayCan w/ Gato | Planning Success Rate0.87 | 3 | 1mo ago | |
| Pick-and-Place Three Times | GR00T N1.5 + HAMLET | Success Rate37.5 | 2 | 3d ago |