Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Task Planning on Robotouille Asynchronous (test)
Loading...
86
Pass@1 Accuracy
GiG+Exp
-3.44
19.78
43
66.22
Jan 29, 2026
Pass@1 Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
GiG+Exp
Backbone=DeepSeek, Pla...
2026.01
86
GiG+Exp
Backbone=Qwen3, Planni...
2026.01
82
GiG
Backbone=Qwen3, Planni...
2026.01
72
GiG
Backbone=Gemini, Plann...
2026.01
66
GiG+Exp
Backbone=Gemini, Plann...
2026.01
66
ReAct
Backbone=Gemini, Plann...
2026.01
60
GiG
Backbone=DeepSeek, Pla...
2026.01
59
ReCAP
Backbone=Qwen3, Planni...
2026.01
35
ReAct
Backbone=Qwen3, Planni...
2026.01
31
ReCAP
Backbone=DeepSeek, Pla...
2026.01
27
ReCAP
Backbone=Gemini, Plann...
2026.01
21
ReAct
Backbone=DeepSeek, Pla...
2026.01
16
CoT
Backbone=Gemini, Plann...
2026.01
4
CoT
Backbone=Qwen3, Planni...
2026.01
0
CoT
Backbone=DeepSeek, Pla...
2026.01
0
Feedback
Search any
task
Search any
task