Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Autonomous Exploration on ALFWorld
Loading...
11.8
Steps
Qwen2.5-7B+GRPO
9.796
23.323
36.85
50.377
May 15, 2026
Steps
ECC
Delta Task
Updated 16d ago
Evaluation Results
Method
Method
Links
Steps
ECC
Delta Task
Qwen2.5-7B+GRPO
Model Category=Open-So...
2026.05
11.8
11.2
1.3
Qwen3-4B
Model Category=Open-So...
2026.05
19.2
35.5
2.2
LLaMA3.1-8B
Model Category=Open-So...
2026.05
22.5
36.8
1.6
GPT-4.1
Model Category=Closed-...
2026.05
24.8
52.3
1.9
Qwen3-4B+GRPO
Model Category=Open-So...
2026.05
35.5
32.8
0.5
Qwen2.5-7B
Model Category=Open-So...
2026.05
36.8
19.3
0.3
Claude-Opus-4.5
Model Category=Closed-...
2026.05
61.9
96.8
6.3
Feedback
Search any
task
Search any
task