Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Preference Ranking on UltraFeedback 2% holdout (test)
Loading...
89.1
Pairwise Acc (RM1-Honest)
BENCHALIGN
73.292
77.396
81.5
85.604
Feb 2, 2026
Pairwise Acc (RM1-Honest)
Pairwise Acc (RM2-Honest)
Spearman Rho (RM1-Honest)
Spearman Rho (RM2-Honest)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1-Honest)
Pairwise Acc (RM2-Honest)
Spearman Rho (RM1-Honest)
Spearman Rho (RM2-Honest)
BENCHALIGN
Model set type=Arbitrary
2026.02
89.1
89.1
0.933
0.929
METABENCH
Model set type=Arbitrary
2026.02
74.6
74.8
0.698
0.689
TINYBENCHMARKS
Model set type=Arbitrary
2026.02
74.2
74.1
0.69
0.679
RANDOM
Model set type=Arbitrary
2026.02
73.9
73.3
0.666
0.655
Feedback
Search any
task
Search any
task