Share your thoughts, 1 month free Claude Pro on usSee more

Agent Behavior Adaptation on AlfWorld (AW) (test)

1,040Loop Ratio

Mistral-7B-Instruct

Updated 5mo ago

Evaluation Results

Method	Links
Mistral-7B-Instruct 2026.02		1,040
gpt-oss-120b 2026.02		950
Qwen3-4B-Thinking 2026.02		440
Ministral-3-14B-Instruct 2026.02		270
Phi-4-reasoning 2026.02		140
Qwen3-30B-A3B-Thinking 2026.02		130
Llama-3.1-8B-Instruct 2026.02		90
Glm-4-9B-Chat 2026.02		80
Qwen3-30B-A3B-Instruct 2026.02		50
Qwen3-4B-Instruct 2026.02		30
Gemini 2.5 Flash 2026.02		30
Phi-4 2026.02		20
Llama-3.3-70B-Instruct 2026.02		0
Glm-4-32B-0414 2026.02		0
DeepSeek-V3.2 2026.02		0
DeepSeek-R1 2026.02		0
Gemini 2.5 Pro 2026.02		0