Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Next-state Prediction Accuracy on ALFWorld (AW)
Loading...
99.87
EM Accuracy
Qwen2.5-7B
32.4988
49.9894
67.48
84.9706
Dec 21, 2025
EM Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
EM Accuracy
Qwen2.5-7B
Evaluation Protocol=SFT
2025.12
99.87
Llama3.1-8B
Evaluation Protocol=SFT
2025.12
99.71
Claude-sonnet-4.5
Evaluation Protocol=Fe...
2025.12
77.04
GPT-5
Evaluation Protocol=Fe...
2025.12
67.13
Claude-sonnet-4.5
Evaluation Protocol=Ze...
2025.12
64.73
GPT-4o-mini
Evaluation Protocol=Fe...
2025.12
63.79
GPT-4.1
Evaluation Protocol=Fe...
2025.12
63.37
GPT-4-turbo
Evaluation Protocol=Fe...
2025.12
62.56
Gemini-2.5-flash
Evaluation Protocol=Fe...
2025.12
61.85
GPT-4o
Evaluation Protocol=Fe...
2025.12
56.88
Gemini-2.5-flash
Evaluation Protocol=Ze...
2025.12
50
GPT-4o-mini
Evaluation Protocol=Ze...
2025.12
45.2
GPT-4o
Evaluation Protocol=Ze...
2025.12
44.45
GPT-4.1
Evaluation Protocol=Ze...
2025.12
43.56
GPT-4-turbo
Evaluation Protocol=Ze...
2025.12
42.64
GPT-5
Evaluation Protocol=Ze...
2025.12
35.09
Feedback
Search any
task
Search any
task