Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Ranking Prediction on Helpsteer 30B+ Models Holdout (test)
Loading...
76.5
Pairwise Accuracy (RM1)
BENCHALIGN
54.244
60.022
65.8
71.578
Feb 2, 2026
Pairwise Accuracy (RM1)
Pairwise Accuracy (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy (RM1)
Pairwise Accuracy (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
BENCHALIGN
Holdout=30B+ Models
2026.02
76.5
63.7
0.71
0.387
TINYBENCHMARKS
Holdout=30B+ Models
2026.02
60.8
49.1
0.324
-0.013
METABENCH
Holdout=30B+ Models
2026.02
60.6
49.7
0.328
0.007
RANDOM
Holdout=30B+ Models
2026.02
55.1
45.2
0.108
-0.148
Feedback
Search any
task
Search any
task