Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sequential Decision Making on ALFWorld (test)
Loading...
96.27
Success Rate
SMaRT
54.358
65.239
76.12
87.001
Oct 20, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
SMaRT
LLM=GPT-4, Shot setup=...
2025.10
96.27
LLM-as-a-Judge
LLM=GPT-4, Shot setup=...
2025.10
91.79
CoT
LLM=GPT-4, Shot setup=...
2025.10
88.81
SMaRT
LLM=Gemini-1.5, Shot s...
2025.10
83.58
LLM-as-a-Judge
LLM=Gemini-1.5, Shot s...
2025.10
80.59
Direct
LLM=GPT-4, Shot setup=...
2025.10
76.87
CoT
LLM=Gemini-1.5, Shot s...
2025.10
70.89
Direct
LLM=Gemini-1.5, Shot s...
2025.10
55.97
Feedback
Search any
task
Search any
task