Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task Completion on ALFWorld (OOD)
Loading...
97.25
Accuracy
SKILLGEN
63.4084
72.1942
80.98
89.7658
May 9, 2026
Accuracy
Absolute Accuracy Change (∆)
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Absolute Accuracy Change (∆)
SKILLGEN
Model Name=GPT-5.4-Min...
2026.05
97.25
3.92
SKILLGEN
Model Name=GPT-5.4-Nan...
2026.05
94.9
21.96
SKILLGEN
Model Name=Gemma-4-26B...
2026.05
93.73
11.37
SKILLGEN
Model Name=Qwen-2.5-7B...
2026.05
82.35
12.94
SKILLGEN
Model Name=Grok-4-Fast...
2026.05
80.39
15.29
SKILLGEN
Model Name=Mistral-Nem...
2026.05
67.06
8.24
SKILLGEN
Model Name=Llama-3.1-8...
2026.05
65.1
-2.35
SKILLGEN
Model Name=Claude-Haik...
2026.05
64.71
3.53
Feedback
Search any
task
Search any
task