Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Behavior Adaptation on FrozenLake (FL) (test)
Loading...
0
Loop Ratio
DeepSeek-V3.2
-2.532
14.559
31.65
48.741
Feb 2, 2026
Loop Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
Loop Ratio
DeepSeek-V3.2
Model Type=Non-thinkin...
2026.02
0
DeepSeek-R1
Model Type=Thinking Model
2026.02
0
Gemini 2.5 Pro
Model Type=Thinking Model
2026.02
0.2
gpt-oss-120b
Model Type=Thinking Model
2026.02
0.7
Gemini 2.5 Flash
Model Type=Non-thinkin...
2026.02
1
Phi-4-reasoning
Model Type=Thinking Model
2026.02
1.4
Ministral-3-14B-Instruct
Model Type=Non-thinkin...
2026.02
4.3
Glm-4-32B-0414
Model Type=Non-thinkin...
2026.02
5.1
Qwen3-30B-A3B-Thinking
Model Type=Thinking Model
2026.02
5.3
Llama-3.1-8B-Instruct
Model Type=Non-thinkin...
2026.02
5.4
Llama-3.3-70B-Instruct
Model Type=Non-thinkin...
2026.02
9.5
Qwen3-4B-Thinking
Model Type=Thinking Model
2026.02
9.8
Glm-4-9B-Chat
Model Type=Non-thinkin...
2026.02
16.7
Phi-4
Model Type=Non-thinkin...
2026.02
24.3
Qwen3-4B-Instruct
Model Type=Non-thinkin...
2026.02
32
Qwen3-30B-A3B-Instruct
Model Type=Non-thinkin...
2026.02
37.7
Mistral-7B-Instruct
Model Type=Non-thinkin...
2026.02
63.3
Feedback
Search any
task
Search any
task