Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Task Planning on ALFWorld (seen domains)
Loading...
72.05
Success Rate (SR)
TMoW
-0.4692
18.3579
37.185
56.0121
Jan 30, 2026
Success Rate (SR)
Progress Score (PS)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Progress Score (PS)
TMoW
Backbone=Llama-3.2-1B
2026.01
72.05
6.94
LLM+FT
Backbone=Llama-3.2-1B,...
2026.01
51.78
13.22
SayCanPay
Say Backbone=Llama-3.2...
2026.01
51.48
13.19
FLARE
Backbone=Llama-3.2-3B
2026.01
21.22
34.4
LLM-Planner
Backbone=Llama-3.2-3B,...
2026.01
11.67
37.19
ZSP
Backbone=Llama-3.2-3B,...
2026.01
2.32
49.34
Feedback
Search any
task
Search any
task