Share your thoughts, 1 month free Claude Pro on usSee more

Model Ranking Prediction on UltraFeedback 13B+ Models Holdout (test)

74.8Pairwise Accuracy (RM1_Honest)

BENCHALIGN

Updated 4mo ago

Evaluation Results

Method	Links
BENCHALIGN 2026.02		74.8	74.9	0.686	0.691
RANDOM 2026.02		60.3	60	0.28	0.272
METABENCH 2026.02		58.6	57.9	0.248	0.229
TINYBENCHMARKS 2026.02		57.2	56.3	0.209	0.184