Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Reward Model Evaluation benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Reward Model Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Reward Bench 2 (test)
Distribution-Calibrated Aggregation
RB2 Factuality MAE
0.451
12
4d ago
RewardBench (test)
Consistency
Kuiper
1.65
8
4d ago
Arena-Hard RU
Qwen3-32B-RM
Best@8 Score
92.69
5
4d ago
Showing 3 of 3 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task