Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Ranking Prediction on UltraFeedback 13B+ Models Holdout (test)
Loading...
74.8
Pairwise Accuracy (RM1_Honest)
BENCHALIGN
56.496
61.248
66
70.752
Feb 2, 2026
Pairwise Accuracy (RM1_Honest)
Pairwise Accuracy (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy (RM1_Honest)
Pairwise Accuracy (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
BENCHALIGN
Holdout=13B+ Models
2026.02
74.8
74.9
0.686
0.691
RANDOM
Holdout=13B+ Models
2026.02
60.3
60
0.28
0.272
METABENCH
Holdout=13B+ Models
2026.02
58.6
57.9
0.248
0.229
TINYBENCHMARKS
Holdout=13B+ Models
2026.02
57.2
56.3
0.209
0.184
Feedback
Search any
task
Search any
task