Share your thoughts, 1 month free Claude Pro on usSee more

Personal Assistant Agent Performance on Claw-Eval general domain 0408

70.8Pass@3

Claude Opus 4.6

Updated 2mo ago

Evaluation Results

Method	Links
Claude Opus 4.6 2026.05		70.8
GPT 5.4 2026.05		60.2
Qwen3.5 397A17B 2026.05		57.8
Gemini 3.1 Pro 2026.05		55.9
GLM 5 Turbo 2026.05		52.8
MiniMax M2.7 2026.05		49.7
MiniMax M2.5 2026.05		47.2
Kimi K2.5 2026.05		36.6
Orchard-Claw (SFT + RL) 2026.05		31.7
Qwen3-Coder-30B-A3B-Instruct 2026.05		30.4
Nemotron-3-nano-30b-a3b 2026.05		26.1
Orchard-Claw (SFT) 2026.05		22.4
Qwen3-30B-A3B-Thinking 2026.05		14.3