Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on RM-Bench Hard
Loading...
0.697
Accuracy
WILDREWARD-8B
0.40996
0.48448
0.559
0.63352
Feb 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
WILDREWARD-8B
Size=8B
2026.02
0.697
WILDREWARD-4B
Size=4B
2026.02
0.686
Internlm2-20b-reward
Size=20b
2026.02
0.628
Llama-3-OffsetBias-RM-8B
Size=8B
2026.02
0.569
ArmoRM-Llama3-8B-v0.1
Backbone=Llama3-8B
2026.02
0.558
INF-ORM-Llama3.1-70B
Backbone=Llama3.1-70B
2026.02
0.54
Athene-RM-8B
Size=8B
2026.02
0.514
Skywork-Reward-Llama-3.1-8B-v0.2
Backbone=Llama-3.1-8B
2026.02
0.493
Llama-3.1-Nemotron-70B
Size=70B
2026.02
0.478
Skywork-Reward-Gemma-2-27B-v0.2
Backbone=Gemma-2-27B
2026.02
0.421
Feedback
Search any
task
Search any
task