Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Task Planning on Robotouille Synchronous
Loading...
97
Pass@1 Accuracy
GiG+Exp
-1.8
23.85
49.5
75.15
Jan 29, 2026
Pass@1 Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
GiG+Exp
LLM Backbone=Qwen3-235...
2026.01
97
GiG
LLM Backbone=Qwen3-235...
2026.01
93
GiG
LLM Backbone=Gemini-2....
2026.01
92
ReAct
LLM Backbone=Gemini-2....
2026.01
92
GiG
LLM Backbone=DeepSeek-...
2026.01
91
GiG+Exp
LLM Backbone=Gemini-2....
2026.01
90
ReCAP
LLM Backbone=Gemini-2....
2026.01
89
GiG+Exp
LLM Backbone=DeepSeek-...
2026.01
88
ReAct
LLM Backbone=Qwen3-235...
2026.01
74
ReCAP
LLM Backbone=DeepSeek-...
2026.01
72
ReCAP
LLM Backbone=Qwen3-235...
2026.01
71
ReAct
LLM Backbone=DeepSeek-...
2026.01
53
CoT
LLM Backbone=Gemini-2....
2026.01
34
CoT
LLM Backbone=Qwen3-235...
2026.01
7
CoT
LLM Backbone=DeepSeek-...
2026.01
2
Feedback
Search any
task
Search any
task