Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Ranking Prediction on UltraFeedback 70B+ Models Holdout (test)

77.4Pairwise Acc (RM1_Honest)

BENCHALIGN

55.24860.99966.7572.501Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
77.474.60.7060.659
2026.02
62.160.50.280.237
2026.02
58.157.50.2460.216
2026.02
56.1570.1970.197