| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| NAVSIM | ELF-VLA | Speed Accuracy85.8 | 9 | 1mo ago | |
| RoboTwin Hard | RoboPARA | TEI2.4 | 7 | 1mo ago | |
| RoboTwin Medium | LLM3 | Task Execution Index (TEI)2.1 | 7 | 1mo ago | |
| RoboTwin Easy | RoboPARA | Task Execution Index (TEI)10.6 | 7 | 1mo ago | |
| Simulated Tasks All tasks | Gemini | Success Rate86.1 | 4 | 1mo ago | |
| Simulated Tasks >7 actions (Long split) | Gemini | Success Rate65.18 | 4 | 1mo ago | |
| Simulated Tasks Medium 3–7 actions | Gemini | Success Rate97.96 | 4 | 1mo ago | |
| Simulated Tasks ≤2 actions (Short) | Gemini | Success Rate97.19 | 4 | 1mo ago | |
| ALFRED (val unseen) | Alfred | EM64 | 4 | 1mo ago | |
| ReasonMap L (long questions) | Ariadne | Weighted Accuracy0.0747 | 3 | 4d ago | |
| ReasonMap S (short questions) | Weighted Accuracy15.44 | 3 | 4d ago |