Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Ranking Prediction on Helpsteer 70B+ Models Holdout (test)
Loading...
77.8
Pairwise Acc (RM1)
BENCHALIGN
52.84
59.32
65.8
72.28
Feb 2, 2026
Pairwise Acc (RM1)
Pairwise Acc (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc (RM1)
Pairwise Acc (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
BENCHALIGN
Holdout=70B+ Models
2026.02
77.8
62
0.707
0.333
METABENCH
Holdout=70B+ Models
2026.02
56.4
48.4
0.2
-0.028
TINYBENCHMARKS
Holdout=70B+ Models
2026.02
54.7
49.4
0.16
-0.004
RANDOM
Holdout=70B+ Models
2026.02
53.8
46.8
0.087
-0.106
Feedback
Search any
task
Search any
task