Share your thoughts, 1 month free Claude Pro on usSee more

Agent Evaluation on Auto-ClawEval Mini (104 environments)

94.2Safety Score

MiniMax M2.7

Updated 3mo ago

Evaluation Results

Method	Links
MiniMax M2.7 2026.04		94.2	35.7	100	44.9
GPT-5.4 2026.04		93.3	51.2	100	56.5
GPT-5-nano 2026.04		93.3	49.6	100	55.7
MiniMax M2.5 2026.04		92.3	45	100	51.4
Claude Sonnet 4.6 2026.04		90.4	50.6	100	54.2
GLM 5 2026.04		90.4	46.4	100	51.3
GLM 5 Turbo 2026.04		88.5	47.2	100	50.3
Claude Opus 4.6 2026.04		87.5	49.3	100	52.1