Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on Minerva (Avg@32, Pass@16)
Loading...
56.57
Avg@32
Training-time reweighting
16.8212
27.1406
37.46
47.7794
Mar 23, 2026
Avg@32
Pass@16
Updated 25d ago
Evaluation Results
Method
Method
Links
Avg@32
Pass@16
Training-time reweighting
Backbone=Qwen3-8B-Base
2026.03
56.57
76.78
DAPO
Backbone=Qwen3-8B-Base
2026.03
55.04
76.98
Training-time reweighting
Backbone=Qwen2.5-Math-7B
2026.03
49.72
70.37
PPL
Backbone=Qwen2.5-Math-7B
2026.03
48.68
68.69
Dominate
Backbone=Qwen2.5-Math-7B
2026.03
47.01
64.59
DAPO
Backbone=Qwen2.5-Math-7B
2026.03
46.43
69.44
Base
Backbone=Qwen3-8B-Base
2026.03
29.8
70.43
Base
Backbone=Qwen2.5-Math-7B
2026.03
18.35
61.04
Feedback
Search any
task
Search any
task