Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Ranking Prediction on UltraFeedback 13B+ Models Holdout (test)

74.8Pairwise Accuracy (RM1_Honest)

BENCHALIGN

56.49661.2486670.752Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
74.874.90.6860.691
2026.02
60.3600.280.272
2026.02
58.657.90.2480.229
2026.02
57.256.30.2090.184