Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Model Ranking Prediction on Helpsteer 30B+ Models Holdout (test)
Loading...
76.5
Pairwise Accuracy (RM1)
BENCHALIGN
54.244
60.022
65.8
71.578
Feb 2, 2026
Pairwise Accuracy (RM1)
Pairwise Accuracy (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy (RM1)
Pairwise Accuracy (RM2)
Spearman Rho (RM1)
Spearman Rho (RM2)
BENCHALIGN
Holdout=30B+ Models
2026.02
76.5
63.7
0.71
0.387
TINYBENCHMARKS
Holdout=30B+ Models
2026.02
60.8
49.1
0.324
-0.013
METABENCH
Holdout=30B+ Models
2026.02
60.6
49.7
0.328
0.007
RANDOM
Holdout=30B+ Models
2026.02
55.1
45.2
0.108
-0.148
Feedback
Search any
task
Search any
task