| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| VoTa-Bench (Seen) | D2PO | SR (Examine&Light)84.72 | 21 | 4d ago | |
| ALFWorld (test) | AEC | Success Rate (Avg)98.7 | 17 | 4d ago | |
| Robotouille Synchronous | GiG+Exp | Pass@1 Accuracy97 | 15 | 4d ago | |
| Robotouille Asynchronous (test) | GiG+Exp | Pass@1 Accuracy86 | 15 | 4d ago | |
| VoTa-Bench 1.0 (Unseen) | D2PO | Examine&Light SR82.27 | 15 | 4d ago | |
| VirtualHome (Seen) | GPT3.5-MCTS | Simple Success9,140 | 10 | 4d ago | |
| ALFWorld standard evaluation set (134 tasks) | GiG | Pass@1 Accuracy97 | 7 | 4d ago | |
| RLBench Unseen domains | TMoW | Success Rate62.75 | 6 | 4d ago | |
| ALFWorld (unseen domains) | TMoW | Success Rate (SR)68.83 | 6 | 4d ago | |
| VirtualHome (unseen domains) | TMoW | Success Rate80.16 | 6 | 4d ago | |
| RLBench Seen domains | TMoW | Success Rate71.89 | 6 | 4d ago | |
| ALFWorld (seen domains) | TMoW | Success Rate (SR)72.05 | 6 | 4d ago | |
| VirtualHome Novel Apartment (Unseen) | GPT3.5-MCTS | Simple Success Rate82.9 | 4 | 4d ago |