Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Minerva Math Dataset (avg.@8)
Loading...
43.66
Average Accuracy @8
SFT
25.1584
29.9617
34.765
39.5683
Aug 25, 2025
Average Accuracy @8
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy @8
SFT
Backbone=Qwen2.5-7B-In...
2025.08
43.66
PSFTwarm-up
Backbone=Qwen2.5-7B-In...
2025.08
43.38
PSFT
Backbone=Qwen2.5-7B-In...
2025.08
43.33
SFT-KL
Backbone=Qwen2.5-7B-In...
2025.08
42.19
Base
Backbone=Qwen2.5-7B-In...
2025.08
40.53
PSFTwarm-up
Backbone=Llama3.1-8B-I...
2025.08
33.64
PSFT
Backbone=Llama3.1-8B-I...
2025.08
32.4
SFT
Backbone=Llama3.1-8B-I...
2025.08
32.17
SFT-KL
Backbone=Llama3.1-8B-I...
2025.08
26.75
Base
Backbone=Llama3.1-8B-I...
2025.08
25.87
Feedback
Search any
task
Search any
task