Share your thoughts, 1 month free Claude Pro on usSee more

Model Ranking Prediction on UltraFeedback 30B+ Models Holdout (test)

77.3Pairwise Acc (RM1_Honest)

BENCHALIGN

Updated 4mo ago

Evaluation Results

Method	Links
BENCHALIGN 2026.02		77.3	76.3	0.73	0.713
RANDOM 2026.02		65	61.8	0.407	0.317
TINYBENCHMARKS 2026.02		61.8	60.6	0.355	0.303
METABENCH 2026.02		61.3	60.4	0.346	0.302