Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Preference Ranking on UltraFeedback 5% holdout (test)
Loading...
87
Pairwise Accuracy (RM1-Honest)
BENCHALIGN
70.464
74.757
79.05
83.343
Feb 2, 2026
Pairwise Accuracy (RM1-Honest)
Pairwise Accuracy (RM2-Honest)
Spearman Rho (RM1-Honest)
Spearman Rho (RM2-Honest)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy (RM1-Honest)
Pairwise Accuracy (RM2-Honest)
Spearman Rho (RM1-Honest)
Spearman Rho (RM2-Honest)
BENCHALIGN
Model set type=Arbitrary
2026.02
87
86.7
0.905
0.899
METABENCH
Model set type=Arbitrary
2026.02
71.8
71.5
0.627
0.611
RANDOM
Model set type=Arbitrary
2026.02
71.5
70.6
0.614
0.594
TINYBENCHMARKS
Model set type=Arbitrary
2026.02
71.1
70.9
0.608
0.593
Feedback
Search any
task
Search any
task