Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Language Model Evaluation on Arena-Hard V2.0

7.03Win Rate

RM-NLHF

3.24444.22725.216.1928Jan 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
7.03
2026.01
6.55
2026.01
4.64
2026.01
4.3
2026.01
3.93
2026.01
3.85
2026.01
3.69
2026.01
3.56
2026.01
3.39