Share your thoughts, 1 month free Claude Pro on usSee more

Mathematics on AIME 25 (Avg@32)

80.2Avg@32

GLM 4.6

Updated 1mo ago

Evaluation Results

Method	Links
GLM 4.6 2025.12		80.2
LongCat-Flash Exp-Chat 2025.12		74.9
LongCat-Flash Chat 2025.12		61.3
DeepSeek V3.2 2025.12		56.5
IBTPO 2026.05		15.3
TreeRL 2026.05		14.9
TreePO 2026.05		14.7
IBRO 2026.05		14.5
GRPO w/ Entropy Reg† 2026.05		14
Vanilla GRPO 2026.05		13.6
GRPO w/ Clip-higher 2026.05		13.5
Initial Model 2026.05		8.9
IBTPO 2026.05		6.7
TreeRL 2026.05		4.6
Vanilla GRPO 2026.05		4.5
GRPO w/ Clip-higher 2026.05		4.5
TreePO 2026.05		4.5
IBRO 2026.05		4.1
GRPO w/ Entropy Reg† 2026.05		2.2
Initial Model 2026.05		1.7