Share your thoughts, 1 month free Claude Pro on usSee more

Agent Behavior Adaptation on FrozenLake (FL) (test)

0Loop Ratio

DeepSeek-V3.2

Updated 5mo ago

Evaluation Results

Method	Links
DeepSeek-V3.2 2026.02		0
DeepSeek-R1 2026.02		0
Gemini 2.5 Pro 2026.02		0.2
gpt-oss-120b 2026.02		0.7
Gemini 2.5 Flash 2026.02		1
Phi-4-reasoning 2026.02		1.4
Ministral-3-14B-Instruct 2026.02		4.3
Glm-4-32B-0414 2026.02		5.1
Qwen3-30B-A3B-Thinking 2026.02		5.3
Llama-3.1-8B-Instruct 2026.02		5.4
Llama-3.3-70B-Instruct 2026.02		9.5
Qwen3-4B-Thinking 2026.02		9.8
Glm-4-9B-Chat 2026.02		16.7
Phi-4 2026.02		24.3
Qwen3-4B-Instruct 2026.02		32
Qwen3-30B-A3B-Instruct 2026.02		37.7
Mistral-7B-Instruct 2026.02		63.3