Share your thoughts, 1 month free Claude Pro on usSee more

Agent Behavior Adaptation on Sudoku (Su) (test)

34.3Loop Ratio

Qwen3-4B-Thinking

Updated 5mo ago

Evaluation Results

Method	Links
Qwen3-4B-Thinking 2026.02		34.3
Qwen3-4B-Instruct 2026.02		22.5
Phi-4 2026.02		21
Mistral-7B-Instruct 2026.02		15.7
Llama-3.1-8B-Instruct 2026.02		14.7
Glm-4-9B-Chat 2026.02		9.5
Ministral-3-14B-Instruct 2026.02		6.9
Glm-4-32B-0414 2026.02		5.8
Qwen3-30B-A3B-Instruct 2026.02		4.2
Llama-3.3-70B-Instruct 2026.02		2.7
Qwen3-30B-A3B-Thinking 2026.02		2.2
Phi-4-reasoning 2026.02		1
Gemini 2.5 Flash 2026.02		0.6
DeepSeek-V3.2 2026.02		0.1
Gemini 2.5 Pro 2026.02		0.1
gpt-oss-120b 2026.02		0
DeepSeek-R1 2026.02		0