Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Overall Language Model Evaluation on Aggregated Benchmarks STEM Code IF General
Loading...
61.7
Average Score
GenRM-R-Align-14B
53.588
55.694
57.8
59.906
Feb 6, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
GenRM-R-Align-14B
Reward Model=GenRM-R-A...
2026.02
61.7
GenRM-R-Align-8B
Reward Model=GenRM-R-A...
2026.02
59.7
GenRM-RLVR-14B
Reward Model=GenRM-RLV...
2026.02
59.4
Qwen3-14B-as-GenRM
Reward Model=Qwen3-14B...
2026.02
59.1
Qwen3-8B-as-GenRM
Reward Model=Qwen3-8B-...
2026.02
58.1
GenRM-RLVR-8B
Reward Model=GenRM-RLV...
2026.02
57.6
Qwen3-8B
Reward Model=Baseline
2026.02
53.9
Feedback
Search any
task
Search any
task