Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Ranking Prediction on UltraFeedback 30B+ Models Holdout (test)

77.3Pairwise Acc (RM1_Honest)

BENCHALIGN

60.6664.9869.373.62Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
77.376.30.730.713
2026.02
6561.80.4070.317
2026.02
61.860.60.3550.303
2026.02
61.360.40.3460.302