Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Preference Ranking on Helpsteer 5% holdout (test)
Loading...
84.9
Pairwise Accuracy (RM1-Helpful)
BENCHALIGN
66.284
71.117
75.95
80.783
Feb 2, 2026
Pairwise Accuracy (RM1-Helpful)
Pairwise Accuracy (RM2-Helpful)
Spearman Rho (RM1-Helpful)
Spearman Rho (RM2-Helpful)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy (RM1-Helpful)
Pairwise Accuracy (RM2-Helpful)
Spearman Rho (RM1-Helpful)
Spearman Rho (RM2-Helpful)
BENCHALIGN
Model set type=Arbitrary
2026.02
84.9
79.2
0.866
0.762
METABENCH
Model set type=Arbitrary
2026.02
70.8
60.4
0.581
0.312
TINYBENCHMARKS
Model set type=Arbitrary
2026.02
70.4
60.1
0.569
0.307
RANDOM
Model set type=Arbitrary
2026.02
67
58.9
0.384
0.257
Feedback
Search any
task
Search any
task