Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on In-Distribution Benchmarks Summary
Loading...
65.7
Average Score
ICPO
41.78
47.99
54.2
60.41
Oct 30, 2025
Average Score
Improvement
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Score
Improvement
ICPO
Base Model=8B, Trainin...
2025.10
65.7
2.2
ICPO†
Base Model=8B, Trainin...
2025.10
65
1.5
GRPOExpertDomain
Base Model=8B, Trainin...
2025.10
64.6
1.1
GRPOExtraRollouts
Base Model=8B, Trainin...
2025.10
64.3
0.8
GRPO
Base Model=8B, Trainin...
2025.10
63.5
-
ICPO
Base Model=1.7B, Train...
2025.10
52.5
4.1
ICPO†
Base Model=1.7B, Train...
2025.10
51.4
3
GRPOExtraRollouts
Base Model=1.7B, Train...
2025.10
50.7
2.3
GRPOExpertDomain
Base Model=1.7B, Train...
2025.10
49.6
1.2
GRPO
Base Model=1.7B, Train...
2025.10
48.4
-
Qwen3
Base Model=8B, Trainin...
2025.10
48.4
-
Qwen3
Base Model=1.7B, Train...
2025.10
42.7
-
Feedback
Search any
task
Search any
task