Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Overall Language Model Evaluation on Aggregated Benchmarks STEM Code IF General

61.7Average Score

GenRM-R-Align-14B

53.58855.69457.859.906Feb 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
61.7
2026.02
59.7
2026.02
59.4
2026.02
59.1
2026.02
58.1
2026.02
57.6
2026.02
53.9