Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task Completion on ALFWorld (All tasks)
Loading...
96
Overall Success Rate
Evolving-RL
38.904
53.727
68.55
83.373
Nov 18, 2025
Dec 17, 2025
Jan 15, 2026
Feb 13, 2026
Mar 14, 2026
Apr 12, 2026
May 11, 2026
Overall Success Rate
Updated 6d ago
Evaluation Results
Method
Method
Links
Overall Success Rate
Evolving-RL
Skill Injection=true
2026.05
96
Evolving-RL
Skill Injection=false
2026.05
93.1
GRPO
Skill Injection=true
2026.05
83.3
SkillRL
Skill Injection=false
2026.05
81.7
ReflAct
Demos=1-shot, Calls/Ta...
2025.11
80.6
GRPO
Skill Injection=false
2026.05
79.9
ReflexGrad
Demos=None, Calls/Task...
2025.11
75.4
LATS
Demos=1-shot, Calls/Ta...
2025.11
72.7
Tree of Thoughts
Demos=1-shot, Calls/Ta...
2025.11
69.7
Self-Refine
Demos=1-shot, Calls/Ta...
2025.11
68.7
ReAct
Demos=1-shot, Calls/Ta...
2025.11
65.7
Memento
Skill Injection=false
2026.05
53
ExpeL
Skill Injection=false
2026.05
46.3
Base Model
Skill Injection=false
2026.05
45.5
Base Model
Skill Injection=true
2026.05
44.5
ReasoningBank
Skill Injection=false
2026.05
41.1
Feedback
Search any
task
Search any
task