Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning Average on PostTrainBench
Loading...
44.81
Average (%)
Official Instruct Model
1.8788
13.0244
24.17
35.3156
May 31, 2026
Average (%)
Updated 1d ago
Evaluation Results
Method
Method
Links
Average (%)
Official Instruct Model
Protocol=Official Inst...
2026.05
44.81
GLM-4.7 & ANDES (Ours)
Protocol=Proposed Meth...
2026.05
31.91
Opus-4.6 (1M)
Protocol=Agent-Post-Tr...
2026.05
28.52
Opus-4.7 (xHigh)
Protocol=Agent-Post-Tr...
2026.05
25.28
Opus-4.6
Protocol=Agent-Post-Tr...
2026.05
23.16
Gemini-3.1-Pro
Protocol=Agent-Post-Tr...
2026.05
22.08
GPT-5.2
Protocol=Agent-Post-Tr...
2026.05
21.26
GPT-5.4 (High)
Protocol=Agent-Post-Tr...
2026.05
20.72
GLM-4.7 (Scaffold-only)
Protocol=Proposed Meth...
2026.05
20.12
MiniMax-M2.5
Protocol=Agent-Post-Tr...
2026.05
9.04
GPT-5.1-Codex-Max
Protocol=Agent-Post-Tr...
2026.05
8.21
MiniMax-M2.1
Protocol=Agent-Post-Tr...
2026.05
5.3
Base Model (SmolLM3-3B)
Protocol=Zero-Shot, Ba...
2026.05
4.52
Kimi-K2-Thinking
Protocol=Agent-Post-Tr...
2026.05
4.52
Qwen3-Max
Protocol=Agent-Post-Tr...
2026.05
4.5
Sonnet-4.5
Protocol=Agent-Post-Tr...
2026.05
3.99
GLM-4.7 (OpenCode)
Protocol=Proposed Meth...
2026.05
3.53
Feedback
Search any
task
Search any
task