Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Evaluation on Auto-ClawEval Mini (104 environments)
Loading...
94.2
Safety Score
MiniMax M2.7
87.232
89.041
90.85
92.659
Apr 20, 2026
Safety Score
Completion Rate
Robustness Score
Mean Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Safety Score
Completion Rate
Robustness Score
Mean Score
MiniMax M2.7
Family=MiniMax
2026.04
94.2
35.7
100
44.9
GPT-5.4
Family=OpenAI
2026.04
93.3
51.2
100
56.5
GPT-5-nano
Family=OpenAI
2026.04
93.3
49.6
100
55.7
MiniMax M2.5
Family=MiniMax
2026.04
92.3
45
100
51.4
Claude Sonnet 4.6
Family=Anthropic
2026.04
90.4
50.6
100
54.2
GLM 5
Family=Zipu AI
2026.04
90.4
46.4
100
51.3
GLM 5 Turbo
Family=Zipu AI
2026.04
88.5
47.2
100
50.3
Claude Opus 4.6
Family=Anthropic
2026.04
87.5
49.3
100
52.1
Feedback
Search any
task
Search any
task