Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Minerva (avg@8)
Loading...
62.45
Avg@8
GRPO w/ Ground-Truth
30.9588
39.1344
47.31
55.4856
Aug 25, 2025
Sep 30, 2025
Nov 5, 2025
Dec 11, 2025
Jan 16, 2026
Feb 21, 2026
Mar 30, 2026
Avg@8
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg@8
GRPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
62.45
GRPO w/ Structure Reward
RL Category=Label-free...
2026.03
61.99
GRPO w/ Majority Voting (TTRL)
RL Category=Label-free...
2026.03
61.86
GRPO w/ Entropy Minimization (EMPO)
RL Category=Label-free...
2026.03
61.44
PPO w/ Structure Reward
RL Category=Label-free...
2026.03
61.08
PPO w/ Ground-Truth
RL Category=Ground-Tru...
2026.03
59.74
Base
2026.03
53.45
Qwen2.5-7B-Instruct
Training Pipeline=PSFT...
2025.08
46.83
Qwen2.5-7B-Instruct
Training Pipeline=SFT...
2025.08
45.27
Qwen2.5-7B-Instruct
Training Pipeline=SFT
2025.08
43.66
Qwen2.5-7B-Instruct
Training Pipeline=PSFT
2025.08
43.33
Llama3.1-8B-Instruct
Training Pipeline=PSFT...
2025.08
37.55
Llama3.1-8B-Instruct
Training Pipeline=SFT...
2025.08
33.09
Llama3.1-8B-Instruct
Training Pipeline=PSFT
2025.08
32.4
Llama3.1-8B-Instruct
Training Pipeline=SFT
2025.08
32.17
Feedback
Search any
task
Search any
task