Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH 500 (avg.@8)
Loading...
84.68
Avg Success Rate (@8)
PSFTwarm-up
47.1048
56.8599
66.615
76.3701
Aug 25, 2025
Avg Success Rate (@8)
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Success Rate (@8)
PSFTwarm-up
Backbone=Qwen2.5-7B-In...
2025.08
84.68
SFT
Backbone=Qwen2.5-7B-In...
2025.08
84.1
SFT-KL
Backbone=Qwen2.5-7B-In...
2025.08
83.55
PSFT
Backbone=Qwen2.5-7B-In...
2025.08
83.35
Base
Backbone=Qwen2.5-7B-In...
2025.08
75.05
PSFTwarm-up
Backbone=Llama3.1-8B-I...
2025.08
74.15
SFT
Backbone=Llama3.1-8B-I...
2025.08
72.6
PSFT
Backbone=Llama3.1-8B-I...
2025.08
71.98
SFT-KL
Backbone=Llama3.1-8B-I...
2025.08
69.83
Base
Backbone=Llama3.1-8B-I...
2025.08
48.55
Feedback
Search any
task
Search any
task