Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Behavior Adaptation on AlfWorld (AW) (test)
Loading...
1,040
Loop Ratio
Mistral-7B-Instruct
-41.6
239.2
520
800.8
Feb 2, 2026
Loop Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
Loop Ratio
Mistral-7B-Instruct
Model Type=Non-thinkin...
2026.02
1,040
gpt-oss-120b
Model Type=Thinking Model
2026.02
950
Qwen3-4B-Thinking
Model Type=Thinking Model
2026.02
440
Ministral-3-14B-Instruct
Model Type=Non-thinkin...
2026.02
270
Phi-4-reasoning
Model Type=Thinking Model
2026.02
140
Qwen3-30B-A3B-Thinking
Model Type=Thinking Model
2026.02
130
Llama-3.1-8B-Instruct
Model Type=Non-thinkin...
2026.02
90
Glm-4-9B-Chat
Model Type=Non-thinkin...
2026.02
80
Qwen3-30B-A3B-Instruct
Model Type=Non-thinkin...
2026.02
50
Qwen3-4B-Instruct
Model Type=Non-thinkin...
2026.02
30
Gemini 2.5 Flash
Model Type=Non-thinkin...
2026.02
30
Phi-4
Model Type=Non-thinkin...
2026.02
20
Llama-3.3-70B-Instruct
Model Type=Non-thinkin...
2026.02
0
Glm-4-32B-0414
Model Type=Non-thinkin...
2026.02
0
DeepSeek-V3.2
Model Type=Non-thinkin...
2026.02
0
DeepSeek-R1
Model Type=Thinking Model
2026.02
0
Gemini 2.5 Pro
Model Type=Thinking Model
2026.02
0
Feedback
Search any
task
Search any
task