Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Model Evaluation on Arena-Hard RU
Loading...
92.69
Best@8 Score
Qwen3-32B-RM
85.6596
87.4848
89.31
91.1352
Dec 11, 2025
Best@8 Score
Worst@8 Score
Delta BoN
Updated 4d ago
Evaluation Results
Method
Method
Links
Best@8 Score
Worst@8 Score
Delta BoN
Qwen3-32B-RM
N=8
2025.12
92.69
70.48
22.21
Skywork-Reward-V2-Llama-3.1-8B
N=8
2025.12
90.49
77.31
13.18
Skywork-Reward-Gemma-2-27B
N=8
2025.12
89.05
74.35
14.7
Llama-3.1-Tulu-3-70B-SFT-RM-RB2
N=8
2025.12
87.37
78.47
8.9
Llama-3.3-Nemotron-70B-Reward-Multilingual
N=8
2025.12
85.93
84.91
1.02
Feedback
Search any
task
Search any
task