Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Ranking Prediction on UltraFeedback 30B+ Models Holdout (test)
Loading...
77.3
Pairwise Acc (RM1_Honest)
BENCHALIGN
60.66
64.98
69.3
73.62
Feb 2, 2026
Pairwise Acc (RM1_Honest)
Pairwise Acc (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1_Honest)
Pairwise Acc (RM2_Honest)
Spearman Rho (RM1_Honest)
Spearman Rho (RM2_Honest)
BENCHALIGN
Holdout=30B+ Models
2026.02
77.3
76.3
0.73
0.713
RANDOM
Holdout=30B+ Models
2026.02
65
61.8
0.407
0.317
TINYBENCHMARKS
Holdout=30B+ Models
2026.02
61.8
60.6
0.355
0.303
METABENCH
Holdout=30B+ Models
2026.02
61.3
60.4
0.346
0.302
Feedback
Search any
task
Search any
task