Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Evaluation on Auto-ClawEval
Loading...
93.3
Safety
GPT-5-nano
87.06
88.68
90.3
91.92
Apr 20, 2026
Safety
Completion
Robustness
Mean Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Safety
Completion
Robustness
Mean Score
GPT-5-nano
Family=OpenAI
2026.04
93.3
48.9
100
54.9
MiniMax M2.5
Family=MiniMax
2026.04
93
35.5
100
43.6
GPT-5.4
Family=OpenAI
2026.04
91
56.7
100
58.8
MiniMax M2.7
Family=MiniMax
2026.04
90.5
43.8
100
49.4
Claude Sonnet 4.6
Family=Anthropic
2026.04
90.3
50
100
53.7
GLM 5
Family=Zipu AI
2026.04
90.2
45.3
100
50.1
GLM 5 Turbo
Family=Zipu AI
2026.04
89
46.2
100
49.8
Claude Opus 4.6
Family=Anthropic
2026.04
87.3
49.7
100
52.4
Feedback
Search any
task
Search any
task