Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH (Accuracy, Response Tokens, Length Reduction)
Loading...
46.74
Accuracy
GRPO+FIRSTN
44.6808
45.2154
45.75
46.2846
May 27, 2026
Accuracy
Response Tokens
Length Reduction (%)
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
Response Tokens
Length Reduction (%)
GRPO+FIRSTN
Model=Llama3.2-3B-Inst...
2026.05
46.74
371.49
8
GRPO
Model=Llama3.2-3B-Inst...
2026.05
46.07
403.94
-
CPPO
Model=Llama3.2-3B-Inst...
2026.05
45.28
359.8
10.9
Pair
Model=Llama3.2-3B-Inst...
2026.05
45.02
163.15
59.6
BPPO
Model=Llama3.2-3B-Inst...
2026.05
44.76
154.1
61.9
Feedback
Search any
task
Search any
task