Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Model Evaluation on Arena-Hard V2.0
Loading...
7.03
Win Rate
RM-NLHF
3.2444
4.2272
5.21
6.1928
Jan 12, 2026
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate
RM-NLHF
Evaluation Protocol=Fe...
2026.01
7.03
Outcome-only
Evaluation Protocol=Fe...
2026.01
6.55
RM-NLHF
Evaluation Protocol=Bo...
2026.01
4.64
Outcome-only
Evaluation Protocol=Bo...
2026.01
4.3
Outcome-only
Evaluation Protocol=Bo...
2026.01
3.93
RM-NLHF
Evaluation Protocol=Bo...
2026.01
3.85
Outcome-only
Evaluation Protocol=Bo...
2026.01
3.69
RM-NLHF
Evaluation Protocol=Bo...
2026.01
3.56
DeepSeek-Distilled-Qwen-7B
Evaluation Protocol=Ba...
2026.01
3.39
Feedback
Search any
task
Search any
task