Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Preference Alignment on UltraFeedback (RM and GPT-4o-mini Evaluators)

71.25Win Rate (RM Evaluator)

Vanilla Baseline

47.527653.686359.84566.0037May 8, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.05
71.2569.7572.45
2026.05
65.545.3757.31
2026.05
48.4450.3850.47