Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on In-Domain Reasoning Suite (MATH, Olympiad, AMC, AIME)
Loading...
94.4
MATH Score
GSPO + LIE
20.56
39.73
58.9
78.07
Feb 12, 2026
MATH Score
Olympiad Score
AMC Score
AIME Score
AIME-25 Score
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
MATH Score
Olympiad Score
AMC Score
AIME Score
AIME-25 Score
Average Score
GSPO + LIE
Backbone=Qwen3-4B
2026.02
94.4
67
85.3
57.7
46.4
70.2
GSPO
Backbone=Qwen3-4B
2026.02
94
68.1
82
54.2
42.5
68.2
Qwen3-4B
Backbone=Qwen3-4B
2026.02
82.8
51.9
60.4
24.2
19.4
47.7
GSPO + LIE
Backbone=Llama-OctoThi...
2026.02
60.8
28.1
30.3
4.5
4.4
25.6
GSPO
Backbone=Llama-OctoThi...
2026.02
55.8
23.1
28.2
3.8
2.3
22.6
OctoThinker
Backbone=Llama-OctoThi...
2026.02
23.4
9
10.4
1.1
0.6
8.9
Feedback
Search any
task
Search any
task