Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Preference Ranking on Helpsteer 2% holdout (test)
Loading...
86.6
Pairwise Acc (RM1)
BENCHALIGN
66.736
71.893
77.05
82.207
Feb 2, 2026
Pairwise Acc (RM1)
Pairwise Acc (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1)
Pairwise Acc (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
BENCHALIGN
Model set type=Arbitrary
2026.02
86.6
80.7
0.895
0.8
METABENCH
Model set type=Arbitrary
2026.02
72.7
63.2
0.624
0.386
TINYBENCHMARKS
Model set type=Arbitrary
2026.02
71.2
62.7
0.597
0.373
RANDOM
Model set type=Arbitrary
2026.02
67.5
58.7
0.385
0.252
Feedback
Search any
task
Search any
task