Share your thoughts, 1 month free Claude Pro on usSee more

Agent Evaluation on Auto-ClawEval

93.3Safety

GPT-5-nano

Updated 3mo ago

Evaluation Results

Method	Links
GPT-5-nano 2026.04		93.3	48.9	100	54.9
MiniMax M2.5 2026.04		93	35.5	100	43.6
GPT-5.4 2026.04		91	56.7	100	58.8
MiniMax M2.7 2026.04		90.5	43.8	100	49.4
Claude Sonnet 4.6 2026.04		90.3	50	100	53.7
GLM 5 2026.04		90.2	45.3	100	50.1
GLM 5 Turbo 2026.04		89	46.2	100	49.8
Claude Opus 4.6 2026.04		87.3	49.7	100	52.4