Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (test) (Accuracy, Reward)
Loading...
7.28
Reward
SEA
-5.2312
-1.9831
1.265
4.5131
May 26, 2025
Reward
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reward
Accuracy
SEA
Base Model=LLaMA-3.2-1...
2025.05
7.28
58
BoN-64
Base Model=LLaMA-3.2-1...
2025.05
1.78
57
BoN-32
Base Model=LLaMA-3.2-1...
2025.05
0.47
46
CBS
Base Model=LLaMA-3.2-1...
2025.05
-0.53
37
BoN-8
Base Model=LLaMA-3.2-1...
2025.05
-1.22
42.5
SFT
Base Model=LLaMA-3.2-1...
2025.05
-1.44
32
ARGS
Base Model=LLaMA-3.2-1...
2025.05
-4.28
20
RS
Base Model=LLaMA-3.2-1...
2025.05
-4.75
29.5
Feedback
Search any
task
Search any
task