Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Ranking Prediction on UltraFeedback 70B+ Models Holdout (test)
Loading...
77.4
Pairwise Acc (RM1_Honest)
BENCHALIGN
55.248
60.999
66.75
72.501
Feb 2, 2026
Pairwise Acc (RM1_Honest)
Pairwise Acc (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1_Honest)
Pairwise Acc (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
BENCHALIGN
Holdout=70B+ Models
2026.02
77.4
74.6
0.706
0.659
RANDOM
Holdout=70B+ Models
2026.02
62.1
60.5
0.28
0.237
METABENCH
Holdout=70B+ Models
2026.02
58.1
57.5
0.246
0.216
TINYBENCHMARKS
Holdout=70B+ Models
2026.02
56.1
57
0.197
0.197
Feedback
Search any
task
Search any
task