Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Task Planning on ALFWorld (unseen domains)
Loading...
68.83
Success Rate (SR)
TMoW
-0.59
17.4325
35.455
53.4775
Jan 30, 2026
Success Rate (SR)
Progress Score (PS)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Progress Score (PS)
TMoW
Backbone=Llama-3.2-1B
2026.01
68.83
37.44
SayCanPay
Say Backbone=Llama-3.2...
2026.01
42.04
40.64
LLM+FT
Backbone=Llama-3.2-1B,...
2026.01
39.61
41.24
FLARE
Backbone=Llama-3.2-3B
2026.01
11.31
42.85
LLM-Planner
Backbone=Llama-3.2-3B,...
2026.01
8.46
43.54
ZSP
Backbone=Llama-3.2-3B,...
2026.01
2.08
49.68
Feedback
Search any
task
Search any
task