Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Behavior Adaptation on BlocksWorld (BW) (test)
Loading...
51
Loop Ratio
Mistral-7B-Instruct
-2.04
11.73
25.5
39.27
Feb 2, 2026
Loop Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
Loop Ratio
Mistral-7B-Instruct
Model Type=Non-thinkin...
2026.02
51
Qwen3-4B-Thinking
Model Type=Thinking Model
2026.02
28.5
Qwen3-4B-Instruct
Model Type=Non-thinkin...
2026.02
15.8
Phi-4
Model Type=Non-thinkin...
2026.02
12.3
Glm-4-9B-Chat
Model Type=Non-thinkin...
2026.02
7.6
Ministral-3-14B-Instruct
Model Type=Non-thinkin...
2026.02
6
Phi-4-reasoning
Model Type=Thinking Model
2026.02
5.2
Llama-3.1-8B-Instruct
Model Type=Non-thinkin...
2026.02
3.2
Qwen3-30B-A3B-Thinking
Model Type=Thinking Model
2026.02
2
Glm-4-32B-0414
Model Type=Non-thinkin...
2026.02
1.2
Qwen3-30B-A3B-Instruct
Model Type=Non-thinkin...
2026.02
1
gpt-oss-120b
Model Type=Thinking Model
2026.02
1
Llama-3.3-70B-Instruct
Model Type=Non-thinkin...
2026.02
0
DeepSeek-V3.2
Model Type=Non-thinkin...
2026.02
0
Gemini 2.5 Flash
Model Type=Non-thinkin...
2026.02
0
DeepSeek-R1
Model Type=Thinking Model
2026.02
0
Gemini 2.5 Pro
Model Type=Thinking Model
2026.02
0
Feedback
Search any
task
Search any
task