Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Best-of-16 Δ)
Loading...
13
Best-of-16 Delta
Expert Reasoning Reward Model
2.392
5.146
7.9
10.654
Oct 2, 2025
Best-of-16 Delta
Updated 1mo ago
Evaluation Results
Method
Method
Links
Best-of-16 Delta
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
13
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
12.1
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
11.3
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
8.8
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
7.6
Expert Reasoning Reward Model
Reward Model Backbone=...
2025.10
2.8
Feedback
Search any
task
Search any
task