Share your thoughts, 1 month free Claude Pro on usSee more

General Reasoning Average on PostTrainBench

44.81Average (%)

Official Instruct Model

Updated 1mo ago

Evaluation Results

Method	Links
Official Instruct Model 2026.05		44.81
GLM-4.7 & ANDES (Ours) 2026.05		31.91
Opus-4.6 (1M) 2026.05		28.52
Opus-4.7 (xHigh) 2026.05		25.28
Opus-4.6 2026.05		23.16
Gemini-3.1-Pro 2026.05		22.08
GPT-5.2 2026.05		21.26
GPT-5.4 (High) 2026.05		20.72
GLM-4.7 (Scaffold-only) 2026.05		20.12
MiniMax-M2.5 2026.05		9.04
GPT-5.1-Codex-Max 2026.05		8.21
MiniMax-M2.1 2026.05		5.3
Base Model (SmolLM3-3B) 2026.05		4.52
Kimi-K2-Thinking 2026.05		4.52
Qwen3-Max 2026.05		4.5
Sonnet-4.5 2026.05		3.99
GLM-4.7 (OpenCode) 2026.05		3.53