Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Preference Ranking on Helpsteer 10% holdout (test)
Loading...
85.5
Pairwise Acc (RM1-Helpful)
BENCHALIGN
64.388
69.869
75.35
80.831
Feb 2, 2026
Pairwise Acc (RM1-Helpful)
Pairwise Acc (RM2-Helpful)
Spearman Rho (RM1-Helpful)
Spearman Rho (RM2-Helpful)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1-Helpful)
Pairwise Acc (RM2-Helpful)
Spearman Rho (RM1-Helpful)
Spearman Rho (RM2-Helpful)
BENCHALIGN
Model set type=Arbitrary
2026.02
85.5
81.4
0.876
0.804
TINYBENCHMARKS
Model set type=Arbitrary
2026.02
67.5
58.7
0.496
0.258
METABENCH
Model set type=Arbitrary
2026.02
67.4
58.5
0.493
0.252
RANDOM
Model set type=Arbitrary
2026.02
65.2
58
0.345
0.223
Feedback
Search any
task
Search any
task