Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Task Planning on ALFWorld standard evaluation set (134 tasks)
Loading...
97
Pass@1 Accuracy
GiG
59.56
69.28
79
88.72
Jan 29, 2026
Pass@1 Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
GiG
Model=Qwen3
2026.01
97
GiG
Model=DeepSeek
2026.01
97
GiG
Model=Gemini
2026.01
91
ReCAP
Model=Qwen3
2026.01
89
ReCAP
Model=Gemini
2026.01
86
ReCAP
Model=DeepSeek
2026.01
82
ReAct
Model=Qwen3
2026.01
61
Feedback
Search any
task
Search any
task