Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Behavior Adaptation on Sudoku (Su) (test)
Loading...
34.3
Loop Ratio
Qwen3-4B-Thinking
-1.372
7.889
17.15
26.411
Feb 2, 2026
Loop Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
Loop Ratio
Qwen3-4B-Thinking
Model Type=Thinking Model
2026.02
34.3
Qwen3-4B-Instruct
Model Type=Non-thinkin...
2026.02
22.5
Phi-4
Model Type=Non-thinkin...
2026.02
21
Mistral-7B-Instruct
Model Type=Non-thinkin...
2026.02
15.7
Llama-3.1-8B-Instruct
Model Type=Non-thinkin...
2026.02
14.7
Glm-4-9B-Chat
Model Type=Non-thinkin...
2026.02
9.5
Ministral-3-14B-Instruct
Model Type=Non-thinkin...
2026.02
6.9
Glm-4-32B-0414
Model Type=Non-thinkin...
2026.02
5.8
Qwen3-30B-A3B-Instruct
Model Type=Non-thinkin...
2026.02
4.2
Llama-3.3-70B-Instruct
Model Type=Non-thinkin...
2026.02
2.7
Qwen3-30B-A3B-Thinking
Model Type=Thinking Model
2026.02
2.2
Phi-4-reasoning
Model Type=Thinking Model
2026.02
1
Gemini 2.5 Flash
Model Type=Non-thinkin...
2026.02
0.6
DeepSeek-V3.2
Model Type=Non-thinkin...
2026.02
0.1
Gemini 2.5 Pro
Model Type=Thinking Model
2026.02
0.1
gpt-oss-120b
Model Type=Thinking Model
2026.02
0
DeepSeek-R1
Model Type=Thinking Model
2026.02
0
Feedback
Search any
task
Search any
task