Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Real-World Agent on Claw-Eval
Loading...
66.3
Pass@3
Claude Opus 4.6
49.348
53.749
58.15
62.551
Mar 29, 2026
Pass@3
Average Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Pass@3
Average Score
Claude Opus 4.6
2026.03
66.3
79.3
GPT-5.4
2026.03
66.3
80.6
GLM-5
2026.03
57.7
73
KAT-Coder-V2
2026.03
55.6
73.4
MiniMax M2.7
2026.03
51.9
70.7
Gemini 3.1 Pro
2026.03
50
74.2
Feedback
Search any
task
Search any
task