Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Personal Assistant Agent Performance on Claw-Eval general domain 0408
Loading...
70.8
Pass@3
Claude Opus 4.6
12.04
27.295
42.55
57.805
May 14, 2026
Pass@3
Updated 19d ago
Evaluation Results
Method
Method
Links
Pass@3
Claude Opus 4.6
2026.05
70.8
GPT 5.4
2026.05
60.2
Qwen3.5 397A17B
2026.05
57.8
Gemini 3.1 Pro
2026.05
55.9
GLM 5 Turbo
2026.05
52.8
MiniMax M2.7
2026.05
49.7
MiniMax M2.5
2026.05
47.2
Kimi K2.5
2026.05
36.6
Orchard-Claw (SFT + RL)
Model Size=30B-A3B (~3...
2026.05
31.7
Qwen3-Coder-30B-A3B-Instruct
Model Size=30B-A3B (~3...
2026.05
30.4
Nemotron-3-nano-30b-a3b
Model Size=30B-A3B (~3...
2026.05
26.1
Orchard-Claw (SFT)
Model Size=30B-A3B (~3...
2026.05
22.4
Qwen3-30B-A3B-Thinking
Model Size=30B-A3B (~3...
2026.05
14.3
Feedback
Search any
task
Search any
task