Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Embodied Instruction Following on ALFWorld official (val)
Loading...
65.3
Success Rate
Llama 3.1 405B
5.708
21.179
36.65
52.121
Dec 22, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Llama 3.1 405B
Scale group=Large Scal...
2025.12
65.3
Qwen 2.5 72B
Scale group=Large Scal...
2025.12
63.5
GPT-OSS 120B
Scale group=Large Scal...
2025.12
60.4
Llama 3.1 70B
Scale group=Large Scal...
2025.12
60.1
GenEnv
Scale group=7B Models,...
2025.12
54.5
GPT-OSS 20B
Scale group=Large Scal...
2025.12
53.6
Qwen 3 32B
Scale group=Large Scal...
2025.12
52.3
Qwen 3 14B
Scale group=Large Scal...
2025.12
37.8
ReSearch
Scale group=7B Models,...
2025.12
18.7
SearchR1
Scale group=7B Models,...
2025.12
16.1
Qwen 2.5 7B
Scale group=7B Models,...
2025.12
14.2
ToRL
Scale group=7B Models,...
2025.12
8
Feedback
Search any
task
Search any
task